Software Development with Code Maps

Robert DeLine, Gina Venolia, and Kael Rowan, "Software Development with Code Maps", Communications of the ACM, Vol. 53 No. 8, Pages 48-54, 10.1145/1787234.1787250

Getting lost in a large code base is altogether too easy. The code consists of many thousands of symbols, with few visual landmarks to guide the eye. As a developer navigates the code, she follows hyperlinks, such as jumping from a method caller to a callee, with no visual transition to show where the jump landed. ... Better support for code diagrams in the development environment could support code understanding and communication, and could serve as a "map" to help keep developers oriented. ... Our goal is to integrate maps into the development environment such that developers can carry out most tasks within the map.

Although the focus of this article is largely on "Code Map as UI", there are hints of the possibility that we might eventually see "Code Map as Language Element" (for example, the comment that "An important lesson from the Oahu research is that developers assign meaning to the spatial layout of the code. Code Canvas therefore takes a mixed initiative approach to layout. The user is able to place any box on the map through direct manipulation..."). The same ideas will of course be familiar to anyone who has worked with environments like Simulink, which provide a combination of diagrammatic structuring and textual definition of algorithms. But in the past such environments have only really been found in specific application domains -- control systems and signal processing in the case of Simulink -- while the Code Map idea seems targeted at more general-purpose software development. Is the complexity of large software systems pushing us towards a situation in which graphical structures like Code Maps will become a common part of the syntax of general-purpose programming languages?

Eff - Language of the Future

This is just a series of blog posts so far, as far as I can tell. But Andrej Bauer's work has been mentioned here many times, I am finding these posts extremely interesting, and I'm sure I will not be alone. So without further ado...

Programming With Effects. Andrej Bauer and Matija Pretnar.

I just returned from Paris where I was visiting the INRIA πr² team. It was a great visit, everyone was very hospitable, the food was great, and the weather was nice. I spoke at their seminar where I presented a new programming language eff which is based on the idea that computational effects are algebras. The language has been designed and implemented jointly by Matija Pretnar and myself. Eff is far from being finished, but I think it is ready to be shown to the world. What follows is an extended transcript of the talk I gave in Paris. It is divided into two posts. The present one reviews the basic theory of algebras for a signature and how they are related to computational effects. The impatient readers can skip ahead to the second part, which is about the programming language.

Omega - Language of the Future

When I discovered Tim Sheard's Languages of the Future, I realized that PLs do indeed have a future (beyond asymptotically approaching CLOS and/or adding whimsical new rules to your type checker). Compared to languages like Lisp, pre-generics Java, and Python, the "futuristic" languages like Haskell and O'Caml seemed to mainly offer additional static verification, and some other neat patterns, but the "bang for the buck" seemed somewhat low, especially since these languages have their own costs (they are much more complex, they rule out many "perfectly fine" programs).

Ωmega seems like a true revolution to me - it shows what can be done with a really fancy typesystem, and this seems like the first astounding advancement over existing languages, from Python to Haskell. Its mantra is that it's possible to reap many (all?) benefits of dependent programming, without having to suffer its problems, by adding two much more humble notions to the widely understood, ordinary functional programming setting: GADTs + Extensible Kinds.

Sheard and coworkers show that these two simple extensions allow the succinct expression of many dependently-typed and related examples from the literature. Fine examples are implementations of AVL and red-black trees that are always balanced by construction - it's simply impossible to create an unbalanced tree; this is checked by the type-system. It seems somewhat obvious that sooner than later all code will be written in this (or a similar) way.

How to understand this stuff: my route was through the generics of Java and C# (especially Featherweight Generic Java, FGJω, and A. Kennedy's generics papers). Once you understand basic notions like type constructors, variance, and kinds, you know everything to understand why GADTs + Extensible Kinds = Dependent Programming (and also esoteric stuff like polytypic values have polykinded types for that matter).

It is my belief that you must understand Ωmega now! Even if you're never going to use it, or something like it, you'll still learn a whole lot about programming. Compared to Ωmega, other languages are puny. ;P

Thorn

Thorn is

a dynamically-typed concurrent language in which lightweight isolated processes communicate by message passing. Thorn includes powerful aggregate data types, a class-based object system, first-class functions, an expressive module system, and a variety of features supporting the gradual evolution of prototype scripts into robust programs.

Thorn is implemented by a compiler targeting the JVM and a Java interpreter, and syntactically resembles Scala, at least superficially.

One of those "features" is a unique (as far as I know) soft type system:

In Thorn, the type dyn (for dynamic) is assumed as default (and never written explicitly). At the other extreme, Thorn supports concrete types, as used in statically typed programming languages. A variable of a concrete type T is guaranteed to refer to a value of that type (or a subtype). [...] While concrete types help with performance and correctness, they introduce restrictions on how software can be used and make rapid development more difficult; scripting languages do not favor them.

As an intermediate step between the two, we propose like types, getting some of the safety of concrete types while retaining the flexibility of dynamic types. Concrete types for var x:T or fun f(x:T) are used in two main places. At a method call x.m(), a static type check ensures that x actually has an m method. At a binding or assignment, like x := y; or f(y), a static type check can ensure that y's value makes sense to assign to x, can reject it entirely, or can inspire a dynamic check. Like types, var x: like T or fun f(x:like T), give the expressive power of concrete type checks on method calls, but do not constrain binding or
assignment. They do require runtime checks and thus may cause programs to fail with runtime type errors: sometimes fewer and never more than dynamic types do.

Concurrency is also a little odd:

Every component (marked by the keyword spawn) runs in a different JVM. Component handles contains sufficient information to identify the node and port on which the component runs.

A couple of papers are linked to the home page; "Thorn - Robust, Concurrent, Extensible Scripting on the JVM", by Bard Bloom, et. al., is a general description of the language, from which come the quotes above; and "Integrating Typed and Untyped Code in a Scripting Language", by Tobias Wrigstad, et. al., with more information about like types.

I have not seen Thorn here before. Apologies if I have just missed it.

FunLoft reactive, concurrent programming language

in the department of yet-another-take-on-reining-in-concurrency-for-mere-mortals, the FunLoft programming language ("Functional Language over Fair Threads") from those braininacs at inria:

FunLoft is an experimental language for concurrent programming, designed with the following objectives:

* make concurrent programming simpler by providing a framework with a clear and sound semantics.
* provide a safe language, in which, for example, data-races are impossible.
* control the use of resources (CPU and memory); for example, memory leaks cannot occur in FunLoft programs, which always react in finite time.
* have an efficient implementation which can deal with large numbers of concurrent components.
* benefit from the real parallelism offered by multicore machines.

FunLoft is safe. In particular, a static analysis ensures that programs are free from data-races and from memory leaks. More specifically, there always exists a bound on the memory used by any program: if the available memory is larger than the bound, no out-of-memory error can occur. The system does not compute the actual value of the bound, but only test for its existence.

Fortifying Macros

Fortifying Macros. Ryan Culpepper, Matthias Felleisen, ICFP 2010.

Existing macro systems force programmers to make a choice between clarity of specification and robustness. If they choose clarity, they must forgo validating significant parts of the specification and thus produce low-quality language extensions. If they choose robustness, they must write in a style that mingles the implementation with the specification and therefore obscures the latter. This paper introduces a new language for writing macros. With the new macro system, programmers naturally write robust language extensions using easy-to-understand specifications. The system translates these specifications into validators that detect misuses—including violations of context-sensitive constraints—and automatically synthesize appropriate feedback, eliminating the need for ad hoc validation code.

Presents syntax-parse, which seems like a truly great advancement over existing macro writing facilities.

Is Transactional Programming Actually Easier?

Is Transactional Programming Actually Easier?, WDDD '09, Christopher J. Rossbach, Owen S. Hofmann, and Emmett Witchel.

Chip multi-processors (CMPs) have become ubiquitous, while tools that ease concurrent programming have not. The promise of increased performance for all applications through ever more parallel hardware requires good tools for concurrent programming, especially for average programmers. Transactional memory (TM) has enjoyed recent interest as a tool that can help programmers program concurrently.

The TM research community claims that programming with transactional memory is easier than alternatives (like locks), but evidence is scant. In this paper, we describe a user-study in which 147 undergraduate students in an operating systems course implemented the same programs using coarse and fine-grain locks, monitors, and transactions. We surveyed the students after the assignment, and examined their code to determine the types and frequency of programming errors for each synchronization technique. Inexperienced programmers found baroque syntax a barrier to entry for transactional programming. On average, subjective evaluation showed that students found transactions harder to use than coarse-grain locks, but slightly easier to use than fine-grained locks. Detailed examination of synchronization errors in the students’ code tells a rather different story. Overwhelmingly, the number and types of programming errors the students made was much lower for transactions than for locks. On a similar programming problem, over 70% of students made errors with fine-grained locking, while less than 10% made errors with transactions.

I've recently discovered the Workshop on Duplicating, Deconstructing, and Debunking (WDDD) and have found a handful of neat papers, and this one seemed especially relevant to LtU.

[Edit: Apparently, there is a PPoPP'10 version of this paper with 237 undergraduate students.]

Also, previously on LtU:

Transactional Memory versus Locks - A Comparative Case Study

Despite the fact Tommy McGuire's post mentions Dr. Victor Pankratius's talk was at UT-Austin and the authors of this WDDD'09 paper represent UT-Austin, these are two independent case studies with different programming assignments. The difference in assignments is interesting because it may indicate some statistical noise associated with problem domain complexity (as perceived by the test subjects) and could account for differences between the two studies.

Everyone always likes to talk about usability in programming languages without trying to do it. Some claim it can't even be done, despite the fact Horning and Gannon did work on the subject 3+ decades ago, assessing how one can Language Design to Enhance Program Reliability. This gives a glimpse both on (a) why it is hard (b) how you can still try to do usability testing, rather than determine the truthiness of a language design decision.

Joe Duffy: A (brief) retrospective on transactional memory

A (brief) retrospective on transactional memory, by Joe Duffy, January 3rd, 2010. Although this is a blog post, don't expect to read it all on your lunch break...

The STM.NET incubator project was canceled May 11, 2010, after beginning public life July 27, 2009 at DevLabs. In this blog post, written 4 months prior to its cancellation, Joe Duffy discusses the practical engineering challenges around implementing Software Transactional Memory in .NET. Note: He starts off with a disclaimer that he was not engaged in the STM.NET project past its initial working group phase.

In short, Joe argues, "Throughout, it became abundantly clear that TM, much like generics, was a systemic and platform-wide technology shift. It didn’t require type theory, but the road ahead sure wasn’t going to be easy." The whole blog post deals with how many implementation challenges platform-wide support for STM would be in .NET, including what options were considered. He does not mention Maurice Herlihy's SXM library approach, but refers to Tim Harris's work several times.

There was plenty here that surprised me, especially when you compare Concurrent Haskell's STM implementation to STM.NET design decisions and interesting debates the team had. In Concurrent Haskell, issues Joe raises, like making Console.WriteLine transactional, are delegated to the type system by the very nature of the TVar monad, preventing programmers from writing such wishywashy code. To be honest, this is why I didn't understand what Joe meant by "it didn't require type theory" gambit, since some of the design concerns are mediated in Concurrent Haskell via type theory. On the other hand, based on the pragmatics Joe discusses, and the platform-wide integration with the CLR they were shooting for, reminds me of The Transactional Memory / Garbage Collection Analogy. Joe also wrote a briefer follow-up post, More thoughts on transactional memory, where he talks more about Barbara Liskov's Argus.

Abstract interpreters for free

Matthew Might, "Abstract interpreters for free", Static Analysis Symposium 2010 (SAS 2010).

...we present a two-step method to convert a small-step concrete semantics into a family of sound, computable abstract interpretations. The first step re-factors the concrete state-space to eliminate recursive structure; this refactoring of the state-space simultaneously determines a store-passing-style transformation on the underlying concrete semantics. The second step uses inference rules to generate an abstract state-space and a Galois connection simultaneously. The Galois connection allows the calculation of the “optimal” abstract interpretation. The two-step process is unambiguous, but nondeterministic: at each step, analysis designers face choices. Some of these choices ultimately influence properties such as flow-, field- and context-sensitivity. Thus, under the method, we can give the emergence of these properties a graph-theoretic characterization.

The work in this paper provides some context for known static analysis techniques like k-CFA, and also opens up some interesting new directions for static analysis development. Also, as Matt points out, there are some pedagogical benefits to having a systematic process for getting from semantics to abstract interpretation.

Sapir-Whorf 70 years on

Many a people have looked at Programming Lanugages through the Sapir-Whorf lens so it's not uncommon to find people making PL claims using that hypothesis. Also not surprisingly, the topic keeps re-appearing here on LtU.

This week's NY Times magazine has an article titled Does Your Language Shape How You Think? by Guy Deutscher which starts as a retrospective on Whorf but then goes into what new research has shown.

Some 50 years ago, the renowned linguist Roman Jakobson pointed out a crucial fact about differences between languages in a pithy maxim: “Languages differ essentially in what they must convey and not in what they may convey.” This maxim offers us the key to unlocking the real force of the mother tongue: if different languages influence our minds in different ways, this is not because of what our language allows us to think but rather because of what it habitually obliges us to think about.

...

When your language routinely obliges you to specify certain types of information, it forces you to be attentive to certain details in the world and to certain aspects of experience that speakers of other languages may not be required to think about all the time. And since such habits of speech are cultivated from the earliest age, it is only natural that they can settle into habits of mind that go beyond language itself, affecting your experiences, perceptions, associations, feelings, memories and orientation in the world.