Joe Duffy: A (brief) retrospective on transactional memory

A (brief) retrospective on transactional memory, by Joe Duffy, January 3rd, 2010. Although this is a blog post, don't expect to read it all on your lunch break...

The STM.NET incubator project was canceled May 11, 2010, after beginning public life July 27, 2009 at DevLabs. In this blog post, written 4 months prior to its cancellation, Joe Duffy discusses the practical engineering challenges around implementing Software Transactional Memory in .NET. Note: He starts off with a disclaimer that he was not engaged in the STM.NET project past its initial working group phase.

In short, Joe argues, "Throughout, it became abundantly clear that TM, much like generics, was a systemic and platform-wide technology shift. It didn’t require type theory, but the road ahead sure wasn’t going to be easy." The whole blog post deals with how many implementation challenges platform-wide support for STM would be in .NET, including what options were considered. He does not mention Maurice Herlihy's SXM library approach, but refers to Tim Harris's work several times.

There was plenty here that surprised me, especially when you compare Concurrent Haskell's STM implementation to STM.NET design decisions and interesting debates the team had. In Concurrent Haskell, issues Joe raises, like making Console.WriteLine transactional, are delegated to the type system by the very nature of the TVar monad, preventing programmers from writing such wishywashy code. To be honest, this is why I didn't understand what Joe meant by "it didn't require type theory" gambit, since some of the design concerns are mediated in Concurrent Haskell via type theory. On the other hand, based on the pragmatics Joe discusses, and the platform-wide integration with the CLR they were shooting for, reminds me of The Transactional Memory / Garbage Collection Analogy. Joe also wrote a briefer follow-up post, More thoughts on transactional memory, where he talks more about Barbara Liskov's Argus.

Sapir-Whorf 70 years on

Many a people have looked at Programming Lanugages through the Sapir-Whorf lens so it's not uncommon to find people making PL claims using that hypothesis. Also not surprisingly, the topic keeps re-appearing here on LtU.

This week's NY Times magazine has an article titled Does Your Language Shape How You Think? by Guy Deutscher which starts as a retrospective on Whorf but then goes into what new research has shown.

Some 50 years ago, the renowned linguist Roman Jakobson pointed out a crucial fact about differences between languages in a pithy maxim: “Languages differ essentially in what they must convey and not in what they may convey.” This maxim offers us the key to unlocking the real force of the mother tongue: if different languages influence our minds in different ways, this is not because of what our language allows us to think but rather because of what it habitually obliges us to think about.


When your language routinely obliges you to specify certain types of information, it forces you to be attentive to certain details in the world and to certain aspects of experience that speakers of other languages may not be required to think about all the time. And since such habits of speech are cultivated from the earliest age, it is only natural that they can settle into habits of mind that go beyond language itself, affecting your experiences, perceptions, associations, feelings, memories and orientation in the world.

Bart De Smet on .NET 4's System.Interactive library

Microsoft employee Bart De Smet, who has a widely trafficked blog, has been writing a lot in the past few months about a new library being designed by his group at Microsoft. Here is a whole truckload of blogpost links, in chronological order, which appears to be how Bart intended folks to read it:

Dec 26, 2009: More LINQ with System.Interactive – The Ultimate Imperative
Dec 27, 2009: More LINQ with System.Interactive – Exceptional Exception Handling
Dec 28, 2009: More LINQ with System.Interactive – Sequences under construction
Dec 29, 2009: More LINQ with System.Interactive – Exploiting the code = data relationship
Dec 30, 2009: More LINQ with System.Interactive – More combinators for your Swiss Army Knife
Jan 01, 2010: The Essence of LINQ – MinLINQ
Jan 07, 2010: More LINQ with System.Interactive – Functional fun and taming side-effects

I don't usually read blogs, but I thought this was a pretty cogent series of posts. Also, judging by how interested LtU and the surrounding blogosphere community was from Erik Meijer's presentation on the Rx framework at the JVM Language Summit 2009, I figured people would like this as well.

The Recruitment Theory of Language Origins

Leo Meyerovich recently started a thread on LtU asking about Historical or sociological studies of programming language evolution?. I've been meaning to post a paper on this topic to LtU for awhile now, but simply cherrypicking for the opportune time to fit it into forum discussion. With Leo's question at hand, I give you an interesting paper that models language evolution, by artificial intelligence researcher Luc Steels. Steels has spent over 10 years researching this area, and his recent paper, The Recruitment Theory of Language Origins, summarizes one of his models for dealing with language evolution:

The recruitment theory of language origins argues that language users recruit and try out different strategies for solving the task of communication and retain those that maximise communicative success and cognitive economy. Each strategy requires specific cognitive neural mechanisms, which in themselves serve a wide range of purposes and therefore may have evolved or could be learned independently of language. The application of a strategy has an impact on the properties of the emergent language and this fixates the use of the strategy in the population. Although neurological evidence can be used to show that certain cognitive neural mechanisms are common to linguistic and non-linguistic tasks, this only shows that recruitment has happened, not why. To show the latter, we need models demonstrating that the recruitment of a particular strategy and hence the mechanisms to carry out this strategy lead to a better communication system. This paper gives concrete examples how such models can be built and shows the kinds of results that can be expected from them.

Why Normalization Failed to Become the Ultimate Guide for Database Designers?

While trying to find marshall's claim that Alberto Mendelzon says the universal relation is an idea re-invented once every 3 years (and later finding a quote by Jeffrey Ullman that the universal relation is re-invented 3 times a year), I stumbled across a very provocative rant by a researcher/practitioner: Why Normalization Failed to Become the Ultimate Guide for Database Designers? by Martin Fotache. It shares an interesting wealth of experience and knowledge about logical design. The author is obviously well-read and unlike usual debates I've seen about this topic, presents the argument thoroughly and comprehensively.

The abstract is:

With an impressive theoretical foundation, normalization was supposed to bring rigor and relevance into such a slippery domain as database design is. Almost every database textbook treats normalization in a certain extent, usually suggesting that the topic is so clear and consolidated that it does not deserve deeper discussions. But the reality is completely different. After more than three decades, normalization not only has lost much of its interest in the research papers, but also is still looking for practitioners to apply it effectively. Despite the vast amount of database literature, comprehensive books illustrating the application of normalization to effective real-world applications are still waited. This paper reflects the point of view of an Information Systems academic who incidentally has been for almost twenty years a practitioner in developing database applications. It outlines the main weaknesses of normalization and offers some explanations about the failure of a generous framework in becoming the so much needed universal guide for database designers. Practitioners might be interested in finding out (or confirming) some of the normalization misformulations, misinterpretations, inconsistencies and fallacies. Theorists could find useful the presentation of some issues where the normalization theory was proved to be inadequate, not relevant, or source of confusion.

The body of the paper presents an explanation for why practitioners have rejected normalization. The author also shares his opinion on potentially underexplored ideas as well, drawing from an obviously well-researched depth of knowledge. In recent years, some researchers, such as Microsoft's Pat Helland, have even said Normalization is for sissies (only to further this with later formal publications such as advocating we should be Building on Quicksand). Yet, the PLT community is pushing for the exact opposite. Language theory is firmly rooted in formal grammars and proven correct 'tricks' for manipulating and using those formal grammars; it does no good to define a language if it does not have mathematical properties ensuring relaibility and repeatability of results. This represents and defines real tension between systems theory and PLT.

I realize this paper focuses on methodologies for creating model primitives, comparing mathematical frameworks to frameworks guided by intuition and then mapped to mathematical notions (relations in the relational model), and some may not see it as PLT. Others, such as Date, closely relate understanding of primitives to PLT: Date claims the SQL language is to blame and have gone to the lengths of creating a teaching language, Tutorial D, to teach relational theory. In my experience, nothing seems to effect lines of code in an enterprise system more than schema design, both in the data layer and logic layer, and often an inverse relationship exists between the two; hence the use of object-relational mapping layers to consolidate inevitable problems where there will be The Many Forms of a Single Fact (Kent, 1988). Mapping stabilizes the problem domain by labeling correspondances between all the possible unique structures. I refer to this among friends and coworkers as the N+1 Schema Problem, as there is generally 1 schema thought to be canonical, either extensionally or intensionally, and N other versions of that schema.

Question: Should interactive programming languages aid practitioners in reasoning about their bad data models, (hand waving) perhaps by modeling each unique structure and explaining how they relate? I could see several reasons why that would be a bad idea, but as the above paper suggests, math is not always the best indicator of what practitioners will adopt. It many ways this seems to be the spirit of the idea behind such work as Stephen Kell's interest in approaching modularity by supporting evolutionary compatibility between APIs (source texts) and ABIs (binaries), as covered in his Onward! paper, The Mythical Matched Modules: Overcoming the Tyranny of Inflexible Software Construction. Similar ideas have been in middleware systems for years and are known as wrapper architecures (e.g., Don’t Scrap It, Wrap It!), but haven't seen much PLT interest that I'm aware of; "middleware" might as well be a synonym for Kell's "integration domains" concept.

EASTL -- Electronic Arts Standard Template Library

The gaming studio Electronic Arts maintains their own version of the Standard Template Library. Despite the fact this is old news, I checked the LtU Archives and the new site, and there is no mention of EASTL anywhere. There are quite a few good blog posts about EASTL on the Internet, as well as the the following paper, EASTL -- Electronic Arts Standard Template Library by Paul Pedriana:

Gaming platforms and game designs place requirements on game software which differ from requirements of other platforms. Most significantly, game software requires large amounts of memory but has a limited amount to work with. Gaming software is also faced with other limitations such as weaker processor caches, weaker CPUs, and non-default memory alignment requirements. A result of this is that game software needs to be careful with its use of memory and the CPU. The C++ standard library's containers, iterators, and algorithms are potentially useful for a variety of game programming needs. However, weaknesses and omissions of the standard library prevent it from being ideal for high performance game software. Foremost among these weaknesses is the allocator model. An extended and partially redesigned replacement (EASTL) for the C++ standard library was implemented at Electronic Arts in order to resolve these weaknesses in a portable and consistent way. This paper describes game software development issues, perceived weaknesses of the current C++ standard, and the design of EASTL as a partial solution for these weaknesses.

This paper is a good introduction to a unique set of requirements video game development studios face, and compliments Manuel Simoni's recent story about The AI Systems of Left 4 Dead. This paper could be a useful inroad to those seeking to apply newer object-functional programming languages and ideas to game development.

Back to the Future: Lisp as a Base for a Statistical Computing System

Back to the Future: Lisp as a Base for a Statistical Computing System by Ross Ihaka and Duncan Temple Lang, and the accompanying slides.

This paper was previously discussed on comp.lang.lisp, but apparently not covered on LtU before.

The application of cutting-edge statistical methodology is limited by the capabilities of the systems in which it is implemented. In particular, the limitations of R mean that applications developed there do not scale to the larger problems of interest in practice. We identify some of the limitations of the computational model of the R language that reduces its effectiveness for dealing with large data efficiently in the modern era.

We propose developing an R-like language on top of a Lisp-based engine for statistical computing that provides a paradigm for modern challenges and which leverages the work of a wider community. At its simplest, this provides a convenient, high-level language with support for compiling code to machine instructions for very significant improvements in computational performance. But we also propose to provide a framework which supports more computationally intensive approaches for dealing with large datasets and position ourselves for dealing with future directions in high-performance computing.

We discuss some of the trade-offs and describe our efforts to realizing this approach. More abstractly, we feel that it is important that our community explore more ambitious, experimental and risky research to explore computational innovation for modern data analyses.

Foot note:
Ross Ihaka co-developed the R statistical programming language with Robert Gentleman. For those unaware, R is effectively an open source implementation of S-PLUS, which in turn was based on S. R is sort of the lingua franca of statistics, and you can usually find R code provided in the back of several Springer Verlag monographs.

Duncan Temple Lang is a core developer of R and has worked on the core engine for TIBCO's S-PLUS.

Thanks to LtU user bashyal for providing the links.

Why API Design Matters

Michi Henning, Why API Design Matters, Communications of the ACM, May 2009.

After more than 25 years as a software engineer, I still find myself underestimating the time it takes to complete a particular programming task. Sometimes, the resulting schedule slip is caused by my own shortcomings: as I dig into a problem, I simply discover it is a lot more difficult than I initially thought, so the problem takes longer to solve—such is life as a programmer. Just as often I know exactly what I want to achieve and how to achieve it, but it still takes far longer than anticipated. When that happens, it is usually because I am struggling with an application programming interface (API) that seems to do its level best to throw rocks in my path and make my life difficult. What I find telling is that, even after 25 years of progress in software engineering, this still happens. Worse, recent APIs implemented in modern programming languages make the same mistakes as their 20-year-old counterparts written in C. There seems to be something elusive about API design that, despite years of progress, we have yet to master.

This is a rather accessible look at the consequences of bad API design. Interestingly enough, the main example revolves around the inappropriate use of side effects. The last section concludes with cultural changes the author feels is necessary to improve the situation.

On Understanding Data Abstraction, Revisited

One of the themes of Barbara Liskov's Turing Award lectue ("CS History 101") was that nobody has invented a better programming concept than abstract data types. William Cook wrote a paper for OOPSLA '09 that looks at how well PLT'ers understand their own vocabulary, in particular abstract data types and concepts that on the syntactical surface blend to all seem like ADTs. The paper is On Understanding Data Abstraction, Revisited.

In 1985 Luca Cardelli and Peter Wegner, my advisor, published an ACM Computing Surveys paper called “On understanding types, data abstraction, and polymorphism”. Their work kicked off a flood of research on semantics and type theory for object-oriented programming, which continues to this day. Despite 25 years of research, there is still widespread confusion about the two forms of data abstraction, abstract data types and objects. This essay attempts to explain the differences and also why the differences matter.

The Introduction goes on to say:

What is the relationship between objects and abstract data types (ADTs)? I have asked this question to many groups of computer scientists over the last 20 years. I usually ask it at dinner, or over drinks. The typical response is a variant of “objects are a kind of abstract data type”. This response is consistent with most programming language textbooks.


So what is the point of asking this question? Everyone knows the answer. It’s in the textbooks.


My point is that the textbooks mentioned above are wrong! Objects and abstract data types are not the same thing, and neither one is a variation of the other.

Ergo, if the textbooks are wrong, then your Dinner Answer to (the) Cook is wrong! The rest of the paper explains how Cook makes computer scientists sing for their supper ;-)

When I’m inciting discussion of this topic over drinks, I don’t tell the the full story up front. It is more fun to keep asking questions as the group explores the topic. It is a lively discussion, because most of these ideas are documented in the literature and all the basic facts are known. What is interesting is that the conclusions to be drawn from the facts are not as widely known.

Design Patterns 15 Years Later: An Interview with Erich Gamma, Richard Helm, and Ralph Johnson

Larry O'Brien recently interviewed three of the Gang of Four about their seminal work on patterns. Larry teased the interview's readers for awhile, but he eventually asked the pressing question that most language designers ask and debate about patterns ;) Here it is:

Larry: GoF came out relatively early in the ascent of OOP as the mainstream paradigm and, for better or worse, "patterns" seem to be associated with OO approaches. You even hear functional and dynamic advocates boasting that their languages "don't need" patterns. How do you respond to that?

Erich: Just as an aside, it is also easy to forget that we wrote design patterns before there was Java or C#.

Ralph: Some of those languages don't need some of the patterns in Design Patterns because their languages provide alternative ways of solving the problems. Our patterns are for languages between C++ and Smalltalk, which includes just about everything called "object-oriented," but they certainly are not for every programming language. I don't think anybody actually says that programmers in other languages don't need patterns; they just have a different set of patterns.

Erich: Design patterns eventually emerge for any language. Design déjà-vu is language neutral. While these experiences are not always captured as patterns, they do exist. The design principles for Erlang come to mind.

Larry: Where would a person go to learn about patterns for dynamic and functional languages? Who's making good contributions?

Ralph: If by "dynamic" you mean dynamic object-oriented languages like Smalltalk, Ruby or Python, then our patterns are applicable. Functional languages require different patterns, but I don't know who is working on them.

Note: At the end of the interview, Erich says that they tried refactoring the patterns into new categories in 2005. The draft breakdown he provides (accidentally???) takes out Memento, Chain of Responsibility, Bridge, Adapter, and Observer.

As I said above these are just notes in a draft state. Doing a refactoring without test cases is always dangerous...

UPDATE: The Gang of Four have an accompanying article for the interview that they wrote as a group. See A Look Back: Why We Wrote Design Patterns: Elements of Reusable Object-Oriented Software.

XML feed