Memory Models: A Case for Rethinking Parallel Languages and Hardware, CACM, August 2010

Memory Models: A Case for Rethinking Parallel Languages and Hardware by Sarita V. Adve and Hans-J. Boehm

This is a pre-print of the actual version.

The era of parallel computing for the masses is here, but writing correct parallel programs remains far more difficult than writing sequential programs. Aside from a few domains, most parallel programs are written using a shared-memory approach. The memory model, which specifies the meaning of shared variables, is at the heart of this programming model. Unfortunately, it has involved a tradeoff between programmability and performance, and has arguably been one of the most challenging and contentious areas in both hardware architecture and programming language specification. Recent broad community-scale efforts have finally led to a convergence in this debate, with popular languages such as Java and C++ and most hardware vendors publishing compatible memory model specifications. Although this convergence is a dramatic improvement, it has exposed fundamental shortcomings in current popular languages and systems that prevent achieving the vision of structured and safe parallel programming.

This paper discusses the path to the above convergence, the hard lessons learned, and their implications. A cornerstone of this convergence has been the view that the memory model should be a contract between the programmer and the system - if the programmer writes disciplined (data-race-free) programs, the system will provide high programmability (sequential consistency) and performance. We discuss why this view is the best we can do with current popular languages, and why it is inadequate moving forward. We then discuss research directions that eliminate much of the concern about the memory model, but require rethinking popular parallel languages and hardware. In particular, we argue that parallel languages should not only promote high-level disciplined models, but they should also enforce the discipline. Further, for scalable and efficient performance, hardware should be co-designed to take advantage of and support such disciplined models. The inadequacies of the state-of-the-art and the research agenda we outline have deep implications for the practice, research, and teaching of many computer science sub-disciplines, spanning theory, software, and hardware.

The IO Monad is 45 years old

Oleg Kiselyov wrote a mail to haskell-cafe today titled, The IO Monad is 45 years old. I thought LtU readers might like this.

Pure and Declarative Syntax Definition: Paradise Lost and Regained, Onward 2010

Pure and Declarative Syntax Definition: Paradise Lost and Regained by Lennart C. L. Kats, Eelco Visser, Guido Wachsmuth from Delft

Syntax definitions are pervasive in modern software systems, and serve as the basis for language processing tools like parsers and compilers. Mainstream parser generators pose restrictions on syntax definitions that follow from their implementation algorithm. They hamper evolution, maintainability, and compositionality of syntax definitions. The pureness and declarativity of syntax definitions is lost. We analyze how these problems arise for different aspects of syntax definitions, discuss their consequences for language engineers, and show how the pure and declarative nature of syntax definitions can be regained.

I haven't compared this version with the Onward 2010 version, but they look essentially the same. It seems timely to post this paper, considering the other recent story Yacc is dead. There is not a whole lot to argue against in this paper, since we all "know" the other approaches aren't as elegant and only resort to them for specific reasons such as efficiency. Yet, this is the first paper I know of that tries to state the argument to software engineers.

For example, the Dragon Book, in every single edition, effectively brushes these topics aside. In particular, the Dragon Book does not even mention scannerless parsing as a technique, and instead only explains the "advantages" of using a scanner. Unfortunately, the authors of this paper don't consider other design proposals, either, such as Van Wyk's context-aware scanners from GPCE 2007. It is examples like these that made me wish the paper was a bit more robust in its analysis; the examples seem focused on the author's previous work.

If you are not familiar with the author's previous work in this area, the paper covers it in the references. It includes Martin Bravenboer's work on modular Eclipse IDE support for AspectJ.

The Triumph of Types: Principia Mathematica's Impact on Computer Science

The Triumph of Types: Principia Mathematica's Impact on Computer Science. Robert L. Constable

The role the ideas of Principia Mathematica played in type theory in programming languages is often alluded to in our discussions, making this contribution to a meeting celebrating the hundredth anniversary of Whitehead-and-Russell's opus provocative.

To get your juices going here is a quote from page 3:

...I will discuss later our efforts at Cornell to create one such type theory, Computational Type Theory (CTT), very closely related to two others, the Calculus of Inductive Constructions (CIC) implemented in the Coq prover and widely used, and Intuitionistic Type Theory (ITT) implemented in the Alf and Agda provers. All three of these efforts, but especially CTT and ITT, were strongly influenced by Principia and the work of Bishop presented in his book Foundations of Constructive Analysis.

The Resurgence of Parallelism

Peter J. Denning and Jack B. Dennis, The Resurgence of Parallelism, Communications of the ACM, Vol. 53 No. 6, Pages 30-32, 10.1145/1743546.1743560

"Multi-core chips are a new paradigm!" "We are entering the age of parallelism!" These are today's faddish rallying cries for new lines of research and commercial development. ... The parallel architecture research of the 1960s and 1970s solved many problems that are being encountered today. Our objective in this column is to recall the most important of these results and urge their resurrection.

A brief but timely reminder that we should avoid reinventing the wheel. Denning and Dennis give a nice capsule summary of the history of parallel computing research, and highlight some of the key ideas that came out of earlier research on parallel computing. This isn't a technically deep article. But it gives a quick overview of the field, and tries to identify some of the things that actually are research challenges rather than problems for which the solutions have seemingly been forgotten.

Algol 58/60

Paul McJones has been curating ALGOL section of Software Preservation Group. He notes:

I recently created an ALGOL section at the Computer History Museum’s Software Preservation Group web site, covering the language standardization efforts — for ALGOL 58 (also known as the International Algebraic Language), ALGOL 60, and ALGOL 68 — and also covering many implementations, dialects, and offshoots, complete with source code, manuals, and papers for many of these. The history of ALGOL has attracted many writers, and the final section of the web site links to many of their papers.

Also see his follow up blog about Whetstone ALGOL.

A Formal System For Euclid's Elements

A Formal System For Euclid's Elements, Jeremy Avigad, Edward Dean, and John Mumma. Review of Symbolic Logic, Vol. 2, No. 4, 2009.

Abstract. We present a formal system, E, which provides a faithful model of the proofs in Euclid’s Elements, including the use of diagrammatic reasoning.

Diagrammatic languages are a perennial favorite discussion topic here, and Euclid's proofs constitute one of the oldest diagrammatic languages around. And yet for hundreds of years (at least since Leibniz) people have argued about whether or not the diagrams are really part of a formal system of reasoning, or whether they are simply visual aids hanging on the side of the true proof. The latter position is the one that Hilbert and Tarski took as well when they gave formal axiomatic systems for geometry.

But was this necessary, or just a contingent fact of the logical machinery available to them? Avigad and his coauthors show the former point of view also works, and that you can do it with very basic proof theory (there's little here unfamiliar to anyone who has read Pierce's book). Yet it sheds a lot of light on how the diagrams in the Elements work, in part because of their very careful analysis of how to read the diagrams -- that is, what conclusion a diagram really licenses you to draw, and which ones are accidents of the specific figure on the page. How they consider these issues is a good model for anyone designing their own visual programming languages.

Google TechTalk: The Evolution of End-User Programming

End-User Programming has been a topical discussion lately in mainstream software outlets. The IEEE journal Software recently had an issue dedicated to end-user programming challenges; see Joel Brandt's Opportunistic Programming: Writing Code to Prototype, Ideate and Discover and Martin Erwig's Software Engineering for Spreadsheets. Also, a few years ago a consortium of universities formed End-Users Shaping Effective Software, which includes Martin Erwig's PLT work on bringing type systems to spreadsheets.

Recently, Google invited Allen Cypher to give a TechTalk on The Evolution of End-User Programming, which appears to be a recapitulation of his VL/HCC paper by the same name. Allen was the editor of Watch What I Do (an LtU recommended reading).

Towards the end of the talk, Allen mentions the practical issues of knowing when to use what tool, and that novice users struggle with finding the right tool for the right job. What's notable about discussion of end-user software engineering is how little attention its proponents pay to its critics biggest criticism: Security. In the IEEE Software realm, probably the most open critic has been Warren Harrison (see: The Dangers of End-User Programming). For example, Ko's 2009 ACM Computing Survey The State of the Art in End-User Software Engineering only mentions security once, in the context of designing end-user description languages for security, but does not assess how well this technique compares to techniques software engineers might employ. It seems strange that leading researchers in visual languages and end-user programming do not discuss the potential usage of object capability systems, especially as companies try to monetize a percentage of the value added by users who mash-up their service with other services.

Why Normalization Failed to Become the Ultimate Guide for Database Designers?

While trying to find marshall's claim that Alberto Mendelzon says the universal relation is an idea re-invented once every 3 years (and later finding a quote by Jeffrey Ullman that the universal relation is re-invented 3 times a year), I stumbled across a very provocative rant by a researcher/practitioner: Why Normalization Failed to Become the Ultimate Guide for Database Designers? by Martin Fotache. It shares an interesting wealth of experience and knowledge about logical design. The author is obviously well-read and unlike usual debates I've seen about this topic, presents the argument thoroughly and comprehensively.

The abstract is:

With an impressive theoretical foundation, normalization was supposed to bring rigor and relevance into such a slippery domain as database design is. Almost every database textbook treats normalization in a certain extent, usually suggesting that the topic is so clear and consolidated that it does not deserve deeper discussions. But the reality is completely different. After more than three decades, normalization not only has lost much of its interest in the research papers, but also is still looking for practitioners to apply it effectively. Despite the vast amount of database literature, comprehensive books illustrating the application of normalization to effective real-world applications are still waited. This paper reflects the point of view of an Information Systems academic who incidentally has been for almost twenty years a practitioner in developing database applications. It outlines the main weaknesses of normalization and offers some explanations about the failure of a generous framework in becoming the so much needed universal guide for database designers. Practitioners might be interested in finding out (or confirming) some of the normalization misformulations, misinterpretations, inconsistencies and fallacies. Theorists could find useful the presentation of some issues where the normalization theory was proved to be inadequate, not relevant, or source of confusion.

The body of the paper presents an explanation for why practitioners have rejected normalization. The author also shares his opinion on potentially underexplored ideas as well, drawing from an obviously well-researched depth of knowledge. In recent years, some researchers, such as Microsoft's Pat Helland, have even said Normalization is for sissies (only to further this with later formal publications such as advocating we should be Building on Quicksand). Yet, the PLT community is pushing for the exact opposite. Language theory is firmly rooted in formal grammars and proven correct 'tricks' for manipulating and using those formal grammars; it does no good to define a language if it does not have mathematical properties ensuring relaibility and repeatability of results. This represents and defines real tension between systems theory and PLT.

I realize this paper focuses on methodologies for creating model primitives, comparing mathematical frameworks to frameworks guided by intuition and then mapped to mathematical notions (relations in the relational model), and some may not see it as PLT. Others, such as Date, closely relate understanding of primitives to PLT: Date claims the SQL language is to blame and have gone to the lengths of creating a teaching language, Tutorial D, to teach relational theory. In my experience, nothing seems to effect lines of code in an enterprise system more than schema design, both in the data layer and logic layer, and often an inverse relationship exists between the two; hence the use of object-relational mapping layers to consolidate inevitable problems where there will be The Many Forms of a Single Fact (Kent, 1988). Mapping stabilizes the problem domain by labeling correspondances between all the possible unique structures. I refer to this among friends and coworkers as the N+1 Schema Problem, as there is generally 1 schema thought to be canonical, either extensionally or intensionally, and N other versions of that schema.

Question: Should interactive programming languages aid practitioners in reasoning about their bad data models, (hand waving) perhaps by modeling each unique structure and explaining how they relate? I could see several reasons why that would be a bad idea, but as the above paper suggests, math is not always the best indicator of what practitioners will adopt. It many ways this seems to be the spirit of the idea behind such work as Stephen Kell's interest in approaching modularity by supporting evolutionary compatibility between APIs (source texts) and ABIs (binaries), as covered in his Onward! paper, The Mythical Matched Modules: Overcoming the Tyranny of Inflexible Software Construction. Similar ideas have been in middleware systems for years and are known as wrapper architecures (e.g., Don’t Scrap It, Wrap It!), but haven't seen much PLT interest that I'm aware of; "middleware" might as well be a synonym for Kell's "integration domains" concept.

Back to the Future: Lisp as a Base for a Statistical Computing System

Back to the Future: Lisp as a Base for a Statistical Computing System by Ross Ihaka and Duncan Temple Lang, and the accompanying slides.

This paper was previously discussed on comp.lang.lisp, but apparently not covered on LtU before.

The application of cutting-edge statistical methodology is limited by the capabilities of the systems in which it is implemented. In particular, the limitations of R mean that applications developed there do not scale to the larger problems of interest in practice. We identify some of the limitations of the computational model of the R language that reduces its effectiveness for dealing with large data efficiently in the modern era.

We propose developing an R-like language on top of a Lisp-based engine for statistical computing that provides a paradigm for modern challenges and which leverages the work of a wider community. At its simplest, this provides a convenient, high-level language with support for compiling code to machine instructions for very significant improvements in computational performance. But we also propose to provide a framework which supports more computationally intensive approaches for dealing with large datasets and position ourselves for dealing with future directions in high-performance computing.

We discuss some of the trade-offs and describe our efforts to realizing this approach. More abstractly, we feel that it is important that our community explore more ambitious, experimental and risky research to explore computational innovation for modern data analyses.

Foot note:
Ross Ihaka co-developed the R statistical programming language with Robert Gentleman. For those unaware, R is effectively an open source implementation of S-PLUS, which in turn was based on S. R is sort of the lingua franca of statistics, and you can usually find R code provided in the back of several Springer Verlag monographs.

Duncan Temple Lang is a core developer of R and has worked on the core engine for TIBCO's S-PLUS.

Thanks to LtU user bashyal for providing the links.

XML feed