Why Normalization Failed to Become the Ultimate Guide for Database Designers?

While trying to find marshall's claim that Alberto Mendelzon says the universal relation is an idea re-invented once every 3 years (and later finding a quote by Jeffrey Ullman that the universal relation is re-invented 3 times a year), I stumbled across a very provocative rant by a researcher/practitioner: Why Normalization Failed to Become the Ultimate Guide for Database Designers? by Martin Fotache. It shares an interesting wealth of experience and knowledge about logical design. The author is obviously well-read and unlike usual debates I've seen about this topic, presents the argument thoroughly and comprehensively.

The abstract is:

With an impressive theoretical foundation, normalization was supposed to bring rigor and relevance into such a slippery domain as database design is. Almost every database textbook treats normalization in a certain extent, usually suggesting that the topic is so clear and consolidated that it does not deserve deeper discussions. But the reality is completely different. After more than three decades, normalization not only has lost much of its interest in the research papers, but also is still looking for practitioners to apply it effectively. Despite the vast amount of database literature, comprehensive books illustrating the application of normalization to effective real-world applications are still waited. This paper reflects the point of view of an Information Systems academic who incidentally has been for almost twenty years a practitioner in developing database applications. It outlines the main weaknesses of normalization and offers some explanations about the failure of a generous framework in becoming the so much needed universal guide for database designers. Practitioners might be interested in finding out (or confirming) some of the normalization misformulations, misinterpretations, inconsistencies and fallacies. Theorists could find useful the presentation of some issues where the normalization theory was proved to be inadequate, not relevant, or source of confusion.

The body of the paper presents an explanation for why practitioners have rejected normalization. The author also shares his opinion on potentially underexplored ideas as well, drawing from an obviously well-researched depth of knowledge. In recent years, some researchers, such as Microsoft's Pat Helland, have even said Normalization is for sissies (only to further this with later formal publications such as advocating we should be Building on Quicksand). Yet, the PLT community is pushing for the exact opposite. Language theory is firmly rooted in formal grammars and proven correct 'tricks' for manipulating and using those formal grammars; it does no good to define a language if it does not have mathematical properties ensuring relaibility and repeatability of results. This represents and defines real tension between systems theory and PLT.

I realize this paper focuses on methodologies for creating model primitives, comparing mathematical frameworks to frameworks guided by intuition and then mapped to mathematical notions (relations in the relational model), and some may not see it as PLT. Others, such as Date, closely relate understanding of primitives to PLT: Date claims the SQL language is to blame and have gone to the lengths of creating a teaching language, Tutorial D, to teach relational theory. In my experience, nothing seems to effect lines of code in an enterprise system more than schema design, both in the data layer and logic layer, and often an inverse relationship exists between the two; hence the use of object-relational mapping layers to consolidate inevitable problems where there will be The Many Forms of a Single Fact (Kent, 1988). Mapping stabilizes the problem domain by labeling correspondances between all the possible unique structures. I refer to this among friends and coworkers as the N+1 Schema Problem, as there is generally 1 schema thought to be canonical, either extensionally or intensionally, and N other versions of that schema.

Question: Should interactive programming languages aid practitioners in reasoning about their bad data models, (hand waving) perhaps by modeling each unique structure and explaining how they relate? I could see several reasons why that would be a bad idea, but as the above paper suggests, math is not always the best indicator of what practitioners will adopt. It many ways this seems to be the spirit of the idea behind such work as Stephen Kell's interest in approaching modularity by supporting evolutionary compatibility between APIs (source texts) and ABIs (binaries), as covered in his Onward! paper, The Mythical Matched Modules: Overcoming the Tyranny of Inflexible Software Construction. Similar ideas have been in middleware systems for years and are known as wrapper architecures (e.g., Don’t Scrap It, Wrap It!), but haven't seen much PLT interest that I'm aware of; "middleware" might as well be a synonym for Kell's "integration domains" concept.

Back to the Future: Lisp as a Base for a Statistical Computing System

Back to the Future: Lisp as a Base for a Statistical Computing System by Ross Ihaka and Duncan Temple Lang, and the accompanying slides.

This paper was previously discussed on comp.lang.lisp, but apparently not covered on LtU before.

The application of cutting-edge statistical methodology is limited by the capabilities of the systems in which it is implemented. In particular, the limitations of R mean that applications developed there do not scale to the larger problems of interest in practice. We identify some of the limitations of the computational model of the R language that reduces its effectiveness for dealing with large data efficiently in the modern era.

We propose developing an R-like language on top of a Lisp-based engine for statistical computing that provides a paradigm for modern challenges and which leverages the work of a wider community. At its simplest, this provides a convenient, high-level language with support for compiling code to machine instructions for very significant improvements in computational performance. But we also propose to provide a framework which supports more computationally intensive approaches for dealing with large datasets and position ourselves for dealing with future directions in high-performance computing.

We discuss some of the trade-offs and describe our efforts to realizing this approach. More abstractly, we feel that it is important that our community explore more ambitious, experimental and risky research to explore computational innovation for modern data analyses.

Foot note:
Ross Ihaka co-developed the R statistical programming language with Robert Gentleman. For those unaware, R is effectively an open source implementation of S-PLUS, which in turn was based on S. R is sort of the lingua franca of statistics, and you can usually find R code provided in the back of several Springer Verlag monographs.

Duncan Temple Lang is a core developer of R and has worked on the core engine for TIBCO's S-PLUS.

Thanks to LtU user bashyal for providing the links.

The Development of Sage

Sage is a project to create a viable free open source alternative to Magma, Maple, Mathematica and Matlab. The lead developer/manager William Stein has recently written Mathematical Software and Me: A Very Personal Recollection, a rather enjoyable story of his experience with mathematical software, especially Magma, and how Sage came to be.

One of the difficulties of writing broadly useful math software is the sheer size and scope of such a project. It is easily outside the abilities of even the most prodigious lone developer. So the focus of Sage, at least up until recently, has been on creating Python-based interfaces to existing mathematical software. For example, for symbolic calculation the Sage distribution includes Maxima (written in Common Lisp), a fork of Macsyma dating back to the early 1980s, and released as open-source software by the US Department of Energy approximately 10 years ago. In addition to Maxima, Sage includes the ability to call out to Magma, Mathematica, and Maple.

There are some interesting PLT-related snippets, for example, Magma's language is frequently criticized, although its algorithms are frequently praised. In conversations with others, OCaml and Haskell were brought up, but William Stein chose Python because he felt that it was more accessible. Also, Axiom, which includes the dependently-typed language Aldor, was rejected in favor of Maxima because Maxima was less esoteric and much more widely used.

Two Bits: The Cultural Significance of Free Software

Christopher Kelty's book, Two Bits: The Cultural Significance of Free Software, can be read online, and I think parts of it will interest many here.

It seems that programming languages, while mentioned, do not receive a lot of attention in this work. I would argue that they are a significant factor in the history that is being told, and an important resource for historians (though reading the history from the languages is not a trivial undertaking by any means).

Still, seems like a very good discussion and well worth pursuing.

Edited to add: As Z-Bo mentions in the comments, the website of the book invites people to re-mix it (or "modulate" it). Motivated readers can thus add the relevant PL perspective, if they so wish.

On Understanding Data Abstraction, Revisited

One of the themes of Barbara Liskov's Turing Award lectue ("CS History 101") was that nobody has invented a better programming concept than abstract data types. William Cook wrote a paper for OOPSLA '09 that looks at how well PLT'ers understand their own vocabulary, in particular abstract data types and concepts that on the syntactical surface blend to all seem like ADTs. The paper is On Understanding Data Abstraction, Revisited.

In 1985 Luca Cardelli and Peter Wegner, my advisor, published an ACM Computing Surveys paper called “On understanding types, data abstraction, and polymorphism”. Their work kicked off a flood of research on semantics and type theory for object-oriented programming, which continues to this day. Despite 25 years of research, there is still widespread confusion about the two forms of data abstraction, abstract data types and objects. This essay attempts to explain the differences and also why the differences matter.

The Introduction goes on to say:

What is the relationship between objects and abstract data types (ADTs)? I have asked this question to many groups of computer scientists over the last 20 years. I usually ask it at dinner, or over drinks. The typical response is a variant of “objects are a kind of abstract data type”. This response is consistent with most programming language textbooks.


So what is the point of asking this question? Everyone knows the answer. It’s in the textbooks.


My point is that the textbooks mentioned above are wrong! Objects and abstract data types are not the same thing, and neither one is a variation of the other.

Ergo, if the textbooks are wrong, then your Dinner Answer to (the) Cook is wrong! The rest of the paper explains how Cook makes computer scientists sing for their supper ;-)

When I’m inciting discussion of this topic over drinks, I don’t tell the the full story up front. It is more fun to keep asking questions as the group explores the topic. It is a lively discussion, because most of these ideas are documented in the literature and all the basic facts are known. What is interesting is that the conclusions to be drawn from the facts are not as widely known.

Liskov's list of papers

Ralph Johnson posted the list of papers that Liskov mentioned as having influence her.

A good place to start as any, I'd say.

Retrospective: An Axiomatic Basis for Computer Programming

Retrospective: An Axiomatic Basis for Computer Programming, by C.A.R. Hoare:

This month marks the 40th anniversary of the publication of the first article I wrote as an academic. I have been invited to give my personal view of the advances that have been made in the subject since then, and the further advances that remain to be made. Which of them did I expect, and which of them surprised me?

An interesting review of the history of computing. He has some nice perspectives on the complementarity of testing and formal methods, and how the growing cracking industry became an unexpected driving force behind industrial interest in verification.

Coders at Work

Peter Seibel's book Coders at Work is apparently available for purchase, so this is a good time to say a few words about it here.

The book consists of interviews with several illustrious programmers about their personal histories, programming style, likes and dislikes and so on. Among the interviewees are several who are well known in programming language circles and are mentioned regularly on LtU, for example Brendan Eich, Joe Armstrong, Simon Peyton Jones, Peter Norvig, Guy Steele, Dan Ingalls, and Ken Thompson. The interviews go into more detail and depth than I dared hoped for or expected, though as is inevitable you end up annoyed that a question you really wanted answered did not come up.

I am sure LtU readers will want to read these interviews for themselves and revel in the technical miscellanea (I read them on a long flight from the US to Australia...), so I am not going to post a detailed review with spoilers. It would be more fun to hear the questions you guys would have asked had you conducted the interviews (and to know which answers you want to quibble with!) So in lieu of a long and tedious review, here are a few LtU-worthy things that caught my attention in a couple of the interviews that I think will interest LtU members.

For some reason I started by jumping to the interview with Dan Ingalls. It turned out to contain many nice morsels to chew on. Dan emphasizes an attribute that might be called programmability all the way down (he is a Smalltalk guy, so that's not such a surprise, I guess): "You should be able, in a computing environment, to zero in on music and musical synthesis and sound and just understand how the whole thing works. It should be accessible. The same thing with graphics." Not surprisingly, Ingalls admits to having an exploratory programing style (in contrast to Knuth who wrote TeX in a notebook...), which probably influenced the types of language he found himself working on. This seems to be the case for several other interviewees as well. Interestingly, Ingalls recalls being influenced by APL. The interactive environment was part of it, but significantly he also mentions the influence on him of the fact that it is expression oriented and not statement oriented like Fortran.

And oh, Ingalls also opines on the age old question: should programmer education begin with assembly. His answer: No. As you would expect, other interviewees probably feel differently.

The interview with Knuth is also very interesting, as you might expect. Here are a few of the things I picked up on in his interview. From his description, it would seem that Knuth was doing his own style of test driven development, though for some reason this angle is not elaborated on in the interview. While Ingalls ponders how to expose kids (and adults) to programming, and Norvig reflects on the failures of end-user programming, Knuth recites his observation that 2% of people are natural born programmers (my words) since they "really resonate with the machine." Perhaps surprisingly Knuth is here concerned with being attuned to the way the machine "really works," not to algorithmic thinking in a general sense. Perhaps, I wonder, what really unites programmers is a compulsion to program: Knuth admits to having the need to program even before having breakfast.

Knuth has a challenge to programming language designers. He claims that every time a new language comes out it cleans up what's already understood, and then adds something new and experimental. How about "setting our sights lower" and aiming for stability. "It might be a good idea," he says. I am pretty sure some here will argue that we have too much lowering of expectations already...

What really resonated with me, and with the LtU ethos, was Knuth lament about people not going back to the original papers and source materials. He puts it simply and powerfully: "I wish I could... instill in more people the love that I have for reading original sources... I was unable to pass that on to any of my students." LtU always had a history department, and going back to historical papers is something I personally love doing. Maybe Knuth should guest blog on LtU... Really! He talks about having collections of source code, compilers in particular - we want to know more!

Finally, let me note the nice contrasts you find among the interviewees. Naturally, these manifest themselves in differing opinions about C. While Knuth sees the C pointer as one of the great advances in computer science, Fran Allen argues that "C has destroyed our ability to advance the state of the art in automatic optimization, automatic parallelization, automatic mapping of a high-level language to the machine."

And if that's not a call to action to all you programming language fanatics, what is?

Disclosure: I was asked to provide a blurb for the cover of the book, and so read the interviews before the book was published. Other than that I had no involvement with the creation of the book, and I have no stake in its success.

Apollo 11 Source Code on GoogleCode

A blog post announces that some of the source code for the Apollo 11 spacecraft has been put online.

On this day 40 years ago, Neil Armstrong and Buzz Aldrin became the first humans to walk on the Moon. This was quite an achievement for mankind and a key milestone in world history.

To commemorate this event the Command Module code (Comanche054) and Lunar Module code (Luminary099) have been transcribed from scanned images to run on yaAGC (an open source AGC emulator) by the Virtual AGC and AGS project.

Since we LTUers spend a lot of time talking about the highest of the high level languages it's illuminating to see how much was done with so little. The source also shows that flying to the moon is really not that different from the kind of programming most programmers do every day. Note the comments.

# Page 801
		CAF	TWO		# WCHPHASE = 2 ---> VERTICAL: P65,P66,P67

MitchFest 2009: Symposium in Honor of Mitchell Wand

I'm pleased to announce that we are planning a celebration for Mitch Wand's 60th birthday!

From the MitchFest home page:

Northeastern University is hosting a special Symposium in celebration of Dr. Mitchell Wand's 60th birthday and honoring his pioneering work in the field of programming languages. For over 30 years Mitch has made important contributions to many areas of programming languages, including semantics, continuations, type theory, hygienic macros, compiler correctness, static analysis and formal verification.

After receiving his PhD from MIT in 1973, Mitch taught at Indiana University where he and colleague Daniel P. Friedman wrote the first edition of their seminal text, Essentials of Programming Languages. Described by a reviewer as "so influential that the initials EOPL are a widely understood shorthand," the text is now in its third edition from MIT Press. Mitch joined the faculty of Northeastern University in 1985 and has been a leader in the College of Computer and Information Science. In 2007, Mitch was inducted as a Fellow of the Association for Computing Machinery.

Please join us at Northeastern on August 23rd and 24th as we celebrate this personal milestone and pay tribute to a great computer scientist, researcher, teacher and colleague, Dr. Mitchell (Mitch) Wand.

LtU regulars will recall that we've discussed DanFest 2004 here before, as well as the talk videos.

MitchFest is open to the public and coordinated with Scheme Workshop 2009, which will be at MIT on August 22nd (the same weekend). More event information, including registration, is available on the MitchFest home page. Following the Symposium, we will be publishing a special edition of HOSC as a Festschrift in honor of Mitch.

We will post a schedule on the web site soon, but for now you can view the preliminary list of papers in the Call for Participation.

Update: added link to HOSC.

XML feed