Ethnographic Study of Copy and Paste Programming Practices in OOPL

From the abstract:

When programmers develop and evolve software, they frequently copy and paste (C&P) code from an existing code base, or sources such as web pages or documentation. We believe that programmers follow a small number of well defined C&P usage patterns when they program, and understanding these patterns would enable us to design tools to improve the quality of software.

I noticed this paper in the references of a (draft) paper on subtext, which is a project previously discussed on LtU. I think that this is interesting from a language design perspective and couldn't find a previous discussion on LtU. (I'll defer my other comments (read: critique) to a later post - if any.)

Wow

Java programmers repeat themselves a lot.
Java programmers do a lot of Copy-Pasting.
Copy-Pasted code is harder to debug and maintain.
Therefore, we need smarter source-code editors.

Wow.

(In fairness to the authors, functional languages are mentioned ... once.)

By Peter McArthur at Sat, 2006-05-20 11:08 | login or register to post comments

"Functional"?

I don't think "functional" is nearly as important as "terse" or "non-redundant" here. The last time I programmed in Java, I copied and pasted like mad, mainly because there's so much boilerplate necessary to get even the simplest things done ("try { ... } catch (IOException e) { ... }"). Typing it out over and over made my fingers ache. Now, I suppose an editor could help with this, but then you have the inverse problem of reading all the repetitive bits to get to what the program actually does. But then again, an editor with one of those outline things on the side and some good code folding can help with that, too... Ergo, the best way to program in Java is to use an environment that removes the need to either read or write it! :)

By sean at Sat, 2006-05-20 16:01 | login or register to post comments

Exactly

Ergo, the best way to program in Java is to use an environment that removes the need to either read or write it!

This is, of course, what professional Java developers actually do. To take the example you gave of try-catch blocks, good IDEs will prevent you from ever having to type them. If you enter something that could throw a checked exception, modern IDEs prompt you for how you want that exception handled, and generate all of the necessary try-catches and exception declarations for you.

For readability, de gustibus. "Terse" and "non-redundant" all too often work out in practice to "lacking any internal assistance to understanding" and "poorly suited to automated or manual error detection and correction". You end up with programs that are easy to write, and though for others to maintain.

By Dave Griffith at Sat, 2006-05-20 18:06 | login or register to post comments

I find a good binding

I find a good binding structure with the option to define things either before or after use goes a long way to eliminating redundancy while retaining some assistance to understanding - for example I seem to use let rather than where for taking things apart in haskell but switch between the two if I'm naming a common factor depending on whether I expect the reader to sufficiently understand the named concepts without the definition or not.

That said, my ability to write run-on sentances probably has a lot to do with why I'm writing functional code in the first place!

By Philippa Cowderoy at Sat, 2006-05-20 18:36 | login or register to post comments

Um...

Yes, quite.

By Peter McArthur at Sun, 2006-05-21 08:51 | login or register to post comments

I guess my point is that let

I guess my point is that let and where are generally pretty terse, certainly more so than not using them if you get to eliminate redundancy in the process. But they're also reasonably well-structured for the reader.

By Philippa Cowderoy at Sun, 2006-05-21 13:32 | login or register to post comments

local-in-end

Let/where is essential, but fairly recently I've come to think that syntactic support for local-in-end, as in Standard ML, is also important and missing from many languages. (I say "syntactic support", because you can usually emulate it (semantically) more or less painfully in other languages.) Local-in-end by itself is like a minimalistic module system that allows you to hide the implementation of a module. (I confess, I'm usually anal about scope.)

By Vesa Karvonen at Sun, 2006-05-21 18:18 | login or register to post comments

Does local-in-end differ

Does local-in-end differ significantly from using where in Haskell, or possibly a let as the outermost expression of a definition? I had a quick poke around google, and I suspect the answer's going to have something to do with modules and/or Haskell's not having a powerful enough module system to care about the relevant distinction?

By Philippa Cowderoy at Sun, 2006-05-21 21:16 | login or register to post comments

local-in-end allows one or

local-in-end allows one or more definitions to share one or more local definitions. Informally, in

local A in B end ; C

declarations in A are visible in B, but only declarations in B are visible in C.

In Haskell you can use pattern matching binding (to bind a tuple of definitions) and let for a subset of the functionality of local-in-end.

Scheme, IMO, sorely needs local-in-end. You can emulate a small subset of local-in-end at top-level with code like this:

(define B1 #f)
...
(define Bn #f)
(let ()
  (define A1 ...)
  ...
  (define Am ...)
  (set! B1 ...)
  ...
  (set! Bn ...))

By Vesa Karvonen at Sun, 2006-05-21 22:20 | login or register to post comments

So "where for binding

So "where for binding (sub)groups" might be a reasonable summary? I'd been thinking automatically-imported submodules might be a reasonable approach for that kind of thing in Haskell, it doesn't seem an issue once you're away from top-level scope (though I guess there's the tuple case that's a bit ugly). Actually, if I've understood correctly then I think I really like "where for binding groups" - if one of my language experiments gets to the point of pleasant concrete syntax I'll probably implement it.

By Philippa Cowderoy at Sun, 2006-05-21 22:44 | login or register to post comments

I think I get you

It occurs to me that it is possible to implement local ... in ... end with R5RS Scheme macros.

We can't write a direct implementation, because macros can't create top-level bindings, but we can reify the notion of a sub-module as a namespace, e.g. by transforming this:

(define-namespace foo
  (a1 ...)
  (a2 ...))

(define b1
  (using-namespace foo
    ...))

(define b2
  (using-namespace foo
    ...))

into this:

(define foo
  (let*
    ((a1 ...)
     (a2 ...))
    (quote a1 a2))) ; Define foo as a tuple

(define b1
  (let*
    ((a1 (car foo))
     (a2 (cadr foo)))
    (...)))

(define b2
  (let* ...))
etc.

This is similar in principal to the Haskell technique of binding tuples.

I've fudged the issue of variable capture: we need the code inside the definition of b1 to see the bindings for a1 and a2, and hygienic macros are supposed to prevent this. There are, however, ways to get around this limitation [.pdf].

Am I onto something here, or is this old news?

By Peter McArthur at Mon, 2006-05-22 09:43 | login or register to post comments

Relict

Strange, I have always considered "local" an anachronism from SML's pre-module days 20 years ago. :-) I never use it. It also does not play well with ML's requirement to make mutual recursion between declarations explicit (i.e. recursion cannot cross "in", nor "end").

By Andreas Rossberg at Mon, 2006-05-22 12:10 | login or register to post comments

Yes, I've noticed that some

Yes, I've noticed that some people find little use for local-in-end (see Paulson's book, for example), but I've found it quite useful for a variety of purposes. One use is as a namespace filter (I picked up this idiom from the MLton codebase, which uses it rather regularly):

local
  open Module
in
val A = A (* Module.A *)
val B = B (* Module.B *)
(*...*)
end

This combines (nearly) the convenience of open with (nearly) the safety of long identifiers. Adding new names to the Module can not inadvertently change the meaning of the program and you still get to use short identifiers. For the (innocent) reader, this also makes it easier to see where the short identifiers are coming from.

Another use is when I want to use some DSL:

local
  open DSL
  (* definitions of auxiliary combinators *)
in
(* the specific combinations of the general DSL combinators that I want *)
end

This avoids having to pollute the surrounding namespace with names defined by the DSL, which can be made rich without worry.

I also often use local-in-end for ad hoc "make" functions:

local
  fun make ... = ...
in
val a = make ...
val b = make ...
val c = make ...
end

But, in general, local-in-end allows one to be very precise about scoping. In particular, local-in-end helps to minimize the scope of auxiliary bindings so that they don't interfere with other bindings.

All in all, I find local-in-end very useful, and use it quite often. In the uses that I put local-in-end to, lack of mutual recursion isn't a recurring issue.

By Vesa Karvonen at Mon, 2006-05-22 20:04 | login or register to post comments

Ah, I get you now

deleted (lowered the tone of the conversation)

By Peter McArthur at Sun, 2006-05-21 19:42 | login or register to post comments

Greenspun's Law?

Ergo, the best way to program in Java is to use an environment that removes the need to either read or write it!

I propose a new version of Greenspun's Law for the noughties:

Any sufficiently advanced Java IDE contains an ad-hoc, half-assed implementation of a template meta-programming language.

Or, if we take the contrapositive:

Any Jave IDE that does not contain an ad-hoc, half-assed implementation of a template meta-programming language is insufficiently advanced.

Modesty forbids me from naming this McArthur's Law. I'll leave that to my peers.

"Terse" and "non-redundant" all too often work out in practice to "lacking any internal assistance to understanding" and "poorly suited to automated or manual error detection and correction". You end up with programs that are easy to write, and tough for others to maintain.

Fair comment. Redundant languages force the programmer to annotate his line-noise. Non-redundant languages allow the programmer to match the structure of his code to his mental model of the program, without all that syntactic cruft getting in the way.

I suspect that it's not so much a matter of taste as a matter of whose code one is obliged to read!

By Peter McArthur at Sun, 2006-05-21 09:48 | login or register to post comments

Yup

Any Jave IDE that does not contain an ad-hoc, half-assed implementation of a template meta-programming language is insufficiently advanced.

All the popular ones certainly do, and the most technologically advanced probably contain four or five different-but-related ones, depending on how you count.

By Dave Griffith at Sun, 2006-05-21 15:52 | login or register to post comments

Re: Greenspun's Law?

Peter McArthur wrote:

Modesty forbids me from naming this McArthur's Law. I'll leave that to my peers.

McArthur's law is insufficiently dissimilar from Peter Seibel's Corollary to Greenspun's 10th [Mon, Nov 29 2004, Message-ID: <m3sm6s8fgv.fsf@javamonkey.com>]:

Any sufficently complicated Java program requires a programmable IDE to make up for the half of Common Lisp not implemented in the program itself.

By el-vadimo at Mon, 2006-05-22 22:03 | login or register to post comments

Aw shucks!

Somebody beat me to it.

By Peter McArthur at Tue, 2006-05-23 09:52 | login or register to post comments

Non-redundancy

"Terse" and "non-redundant" all too often work out in practice to "lacking any internal assistance to understanding" and "poorly suited to automated or manual error detection and correction".

Well, I agree that sometimes notations can be too terse for their own good, but in my experience non-redundancy, which I interpret here primarily as lack of duplication, is a very good thing. My experience has been that eliminating duplication often leads to significant design insights (like the discovery of properly tail-recursive implementation of function calls) and generally improves program design by making the semantics of the program clearer due to elimination of unnecessary special cases.

By Vesa Karvonen at Sun, 2006-05-21 16:17 | login or register to post comments

yes

and a truly good editor would dynamically transform the source syntax and indenting style to our favorite syntax and indenting style and vice-versa, like reading and writing code in scheme and have it saved in java... ;)

By rmalafaia at Sat, 2006-05-20 19:27 | login or register to post comments

Cost

Cost to upgrade modern Java editors with the functionality the authors suggest: 20 person-years, worst case (adding a half-dozen medium-sized features to four different IDEs). Best case, some enterprising hobbyists are already halfway through a plugin implementation. No significant technical risk.

Cost to drive a functional language to Java-like levels of penetration: 200,000 person-years, best case (porting to a thousand platforms from cell-phones to massive computing grids, building out hundreds high-quality libraries, deploying to a billion browsers, writing hundreds of books and articles, but mostly training, training, and more training). Technical risk: no one has any idea if it is even theoretically possible to scale the a development community based on functional languages to those levels, given current technologies and educational infrastructure.

I'm not sure I agree 100% with your police work, there.

By Dave Griffith at Sun, 2006-05-21 14:22 | login or register to post comments

Unaware of well known techniques

I think that one big problem with the paper is that they didn't consult (or, if they did, it was not mentioned) any outside expert programmers for comments on the C&P patterns. Looking at the examples (all of them), I immediately see well known techniques, idioms, and patterns to avoid the repetitions (even in Java).

Perhaps this is a problem with their goal. They wanted to find "well defined C&P usage patterns" so that they could suggest "a set of tools" to "reduce maintenance problems incurred by C&P". IMO, the paper just doesn't contain any convincing examples of C&P that should be encouraged by IDE support. (You can probably tell that I'm not in favour of C&P programming.)

The example in figure 6 of traversing over elements in a DOM document, for example, is something that I would do with a Template Method or Strategy/Visitor in Java. IMO, C&P of this kind of code just isn't necessary (even) in Java and encouraging it with tool support would be misguided. What programmers copying code like this need is education on programming techniques not power tools.

By Vesa Karvonen at Sun, 2006-05-21 15:50 | login or register to post comments

Seems appropriate....

The Psychology of Repetitive Reading

Human beings can be induced to carry out many kinds of repetitive actions.
In this experiment, the author asked 200 subjects to read a very repetitive
essay. The essay consisted of a single paragraph repeated several times.
Each subject was told beforehand that the essay was highly repetitive. The
result was surprising. Ninety-two percent of the subjects read the essay
completely from beginning to end.

By John Carter at Mon, 2006-05-22 00:58 | login or register to post comments

Seems very different

I haven't (at least not yet) read your reference, but asking someone to read a single repetitive essay once is very different from maintaining a highly repetitively written legacy application ad infinitum. These days snippets from such legacy applications regularly end up at TheDailyWTF.

By Vesa Karvonen at Mon, 2006-05-22 06:19 | login or register to post comments

Copy-Pasting a lot

The solution should be meta-programming and code generation. The basic concept is to write a program that can write program for you. In simple cases, a macro system can avoid many copy-and-pasting. I don't think C&P is a good programming style even a smart editor is used.

By Lee Chun Kin at Sat, 2006-05-20 17:51 | login or register to post comments

metaprogramming?

Q: we have to type a lot

A: let's use an IDE that'll do the typing for us

Q: we still need to read the code the IDE typed for us

A: let's use metaprogramming or codegen - you'll only have to read the macros that create the code

Q: I don't understand the stacktrace the debugger gave me

A: whoops, maybe metaprogramming wasn't such a good idea, what we really need is a new semantic level

Bjarne Stroustrup mentions in his D&E book a lesser-known point about Cfront, his first C++ compiler. Everyone knows that Cfront was a preprocessor that output C code; not everyone knows that Cfront did all the error handling by itself (i.e. any error in the generated C code meant an error in Cfront, not an error in the C++ input). In Joel Spolsky terms, Stroustrup tried to make sure the abstraction didn't leak.

By Vladimir Slepnev at Sun, 2006-05-21 09:14 | login or register to post comments

Semantic levels

you guys seem to be talking about an interpreter written in Java. An interpreter puts together blocks of code and runs the code. If you like what the interpreter is doing it can simply stuff the code into a file for later use. That is called compiling. An interpreter is a good way to think about and construct a translation between semantic levels.

By Hank Thediek at Sun, 2006-05-21 12:24 | login or register to post comments

Language design perspective

The real question is which (kind of) well defined language features would eliminate the need for the well defined C&P patterns.

By Vesa Karvonen at Sun, 2006-05-21 16:44 | login or register to post comments

Armchair abstraction

You can sit back in your armchair and look at any finished program and find ways to better abstract out patterns of duplication. But these are not so evident while the program is being developed. And even replacing duplication with abstraction after the fact is not always a win. It makes the code more abstract, and thus harder to comprehend, and requires a larger vocabulary of abstractions to be mastered before the code can even be read. Verbose repetitive code that is concrete and clear can be a win in practice.

There has been some attempt to rationalize these kind of tradeoffs in the "attention investment model".

I wish we could have the best of both worlds. We might be able to smoothly transition from copy & paste to functional abstraction if we saw them as special cases of a more general mechanism. I discuss "inclusions" in the paper cited above, but haven't demonstrated yet that they would work for this purpose.

By Jonathan Edwards at Sun, 2006-05-21 21:04 | login or register to post comments

Refactor mercilessly until you're begging for mercy

I'm glad someone stepped out of the groupthink for a bit. Yes, there can be too much abstraction and yes, sometimes a little redundant code can make things easier to comprehend.

Hindsight is 20/20 and stepping away from the problem for a while (5 minutes even) can lead to new insights. In the real world, we don't have the time to make the "perfect" code.

That said, I do find a language like Java to have way too much syntactic redundancy. With today's smart IDEs, it's not so much the writing of the code that is the problem, but the reading of it. You still have to unfold a fold to see what's going on. It's interesting to note that JetBrains (who make the IDEA IDE), are also working on a DSL tool. Are the smart IDE developers leading Java programmers to intentional programming?

Of course, I don't see any reason that you couldn't have your cake and eat it too, but most functional languages don't have nowhere near the smart IDE support like Java. I'd love to see a smart IDE for something like Nemerle

By Dave Lopez at Sun, 2006-05-21 22:09 | login or register to post comments

HOFs vs loops

You still have to unfold a fold to see what's going on.

I'm not sure what you mean here, but I find code using well selected HOFs easier to read. In typical Java (or C or C++) code, almost everything is done with ad hoc loops. You have to carefully analyze such loops (to mentally unfold them, which may be what you meant) to discover the underlying intention of the loop. With HOFs much of the intention is spelled out explicitly, either it is map, filter, find, or ...

By Vesa Karvonen at Sun, 2006-05-21 22:32 | login or register to post comments

Code Folding

As in what editors do to hide method bodies. Sorry for the confusion.

By Dave Lopez at Sun, 2006-05-21 22:59 | login or register to post comments

Hindsight is a benefit

You can sit back in your armchair and look at any finished program and find ways to better abstract out patterns of duplication.

Let me quote Michael Feathers:

I like working with programmers who write a bit of code and then ask themselves "is there some simpler way of doing this?" If the answer is yes, they stop and redo it before going on.

So, while I agree that hindsight is 20-20, I consider that a benefit rather than a hindrance. While working on a project, I always keep an eye on how to better factor the program. To me, a piece of code is never "finished". When I realize an opportunity to improve things, I try it out and if it actually makes things better, I keep the changes.

But these are not so evident while the program is being developed. And even replacing duplication with abstraction after the fact is not always a win. It makes the code more abstract, and thus harder to comprehend, and requires a larger vocabulary of abstractions to be mastered before the code can even be read. Verbose repetitive code that is concrete and clear can be a win in practice.

So, if it was hard to write, it must be hard to read(, modify, and maintain in general).

I definitely disagree here. Most duplication is usually very easy to spot. You just need to keep your eyes open. I've had more than my share of dealing with repetitive legacy code full of duplication and I see little advantage in fixing the same bugs repeatedly. Ask me to choose between learning a new abstraction that eliminates a form of duplication vs fixing the same bug over and over again, I'd choose learning the abstraction any day.

By Vesa Karvonen at Sun, 2006-05-21 22:48 | login or register to post comments

An individual's perception of code

So, while I agree that hindsight is 20-20, I consider that a benefit rather than a hindrance. While working on a project, I always keep an eye on how to better factor the program. To me, a piece of code is never "finished". When I realize an opportunity to improve things, I try it out and if it actually makes things better, I keep the changes.

If it's never "finished" then how do you move on? Even the maintainence guy doesn't have an infinite amount of time to keep on going back to refactor. You refactor, then you have to write tests, and even with tests there might be subtle regressions...

I guess my point is that I feel that at some point, it's time to move on.

So, if it was hard to write, it must be hard to read(, modify, and maintain in general).

These days, in at least the Java world, there's been a lot of criticism about over-abstractions - abstraction over abstraction over abstraction. Maybe this conflicts with "Do the simplest thing possible" sometimes.

But I'm not sure that abstractions always lead to easier understanding of code.

By Dave Lopez at Sun, 2006-05-21 23:12 | login or register to post comments

Code is malleable

If it's never "finished" then how do you move on?

What I mean is that I don't consider code to be cast in stone. When I learn a new abstraction technique, I revisit code that I think could benefit from the abstraction.

But I'm not sure that abstractions always lead to easier understanding of code.

An abstraction is something that you learn once and apply multiple times. It may be more difficult to understand an abstraction than a concretization, but once you understand the abstraction, you don't have to think about it anymore.

By Vesa Karvonen at Sun, 2006-05-21 23:29 | login or register to post comments

Yep, it is and thus easy C&P

An abstraction is something that you learn once and apply multiple times. It may be more difficult to understand an abstraction than a concretization, but once you understand the abstraction, you don't have to think about it anymore.

Code is malleable and that's why C&P is used. And I'm not buying that moving a little redundant code into a method is necessarily an "abstraction" and will necessarily make things easier to understand. Soemtimes you just want some code to look at, and it doesn't really have a particular significance outside of a local context.

We have lambda functions and local functions and those aren't necessarily abstractions. We could put them into a module or class-level method if we wanted to. So you're not necessarily re-using that multiple times.

By Dave Lopez at Mon, 2006-05-22 02:37 | login or register to post comments

I'm not buying that moving a

I'm not buying that moving a little redundant code into a method is necessarily an "abstraction"

No argument here. One needs to understand what to turn into an abstraction and what not to when eliminating repetition.

Soemtimes you just want some code to look at, and it doesn't really have a particular significance outside of a local context.

I'm not sure what your point here is, but I never want just "some code" to look at. I want to look at the precise section of code that needs to be considered (e.g. to fix a bug or add a new feature). In code formed by C&P followed by some ad hoc editing it is generally very hard to look at the precise section of code that needs to be considered, because it may cut over multiple functions and even multiple files in multiple modules.

We have lambda functions and local functions and those aren't necessarily abstractions.

Ad hoc lambdas are often used to supply the concrete operations to an abstract HOF (like map or filter). The abstraction is the HOF.

Carefully named local functions are often used for improving readability by giving abstract, readable names to concrete, detailed calculations even when the calculation is used only once. Such local functions are abstractions.

By Vesa Karvonen at Mon, 2006-05-22 06:42 | login or register to post comments

It seems to me that rather

It seems to me that rather than "finishing" the issue becomes prioritisation?

I suspect part of the problem in the Java world with over-abstraction is that Java doesn't make recombining quick and easy (I don't know enough to comment about whether it results in inefficient programs with modern JVMs).

By Philippa Cowderoy at Sun, 2006-05-21 23:43 | login or register to post comments

It realy must depend on the

It realy must depend on the details of the duplication. If the code is duplicated a few times and within short spacial distance it might not be so bad.

And also you should be carefull to not abstract away any coinsidental duplication. Two pieces of code could look just the same but for wastly diffrent reasons. So that you dont want changes to one affect the other.

By Felicia Li Svilling at Sun, 2006-05-21 23:32 | login or register to post comments

While refactoring, keep the Open-Closed Principle in mind

If the code is duplicated a few times and within short spacial distance it might not be so bad.

When the duplicates are close to each other, it is usually easiest to eliminate the duplication. Just use a local definition (let, where, local, ...).

Two pieces of code could look just the same but for wastly diffrent reasons. So that you dont want changes to one affect the other.

Do you have a good example of this?

But, indeed, if two similar looking snippets of code are in an intermediate stage being edited to something vastly different from each other, then it is too early for hindsight.

By Vesa Karvonen at Sun, 2006-05-21 23:52 | login or register to post comments

cpd - Copy&Paste Detector

CPD Copy and Paste Dectector - Not only for Java.

By John Carter at Mon, 2006-05-22 05:31 | login or register to post comments

Eric S Raymond's Comparator

Comparator
comparator and filterator are a pair of tools for rapidly finding common code segments in large source trees. They can be useful as tools for detecting copyright infringement.

By Ewan at Mon, 2006-05-22 10:23 | login or register to post comments

Abstraction is not a panacea

An abstraction is something that you learn once and apply multiple times. It may be more difficult to understand an abstraction than a concretization, but once you understand the abstraction, you don't have to think about it anymore.

How about a month later? Or a year? How about the 10 other people who need to modify your code?

Abstraction is great for global, stable concepts. Like math. It can be a loss for ad hoc, local, and impermanent concepts. Most of programming is ad hoc, local, and impermanent.

By Jonathan Edwards at Mon, 2006-05-22 15:22 | login or register to post comments

Yeah! Down with names!

I think you need to define what kind of abstraction you're talking about. A function is an abstraction. Defining a variable is an abstraction. These things can be a loss, but most of the time, applied with a minimal amount of savvy, they're not, otherwise we'd all be programming in machine code or some programming language that deprecates naming.

Further, once you define what kind of "big" abstractions I assume you're really referring to, I suspect it'll turn out that they're only a loss in certain contexts. Perhaps you're thinking of abstractions in languages which don't abstract well, for example, where even small abstractions are heavyweight (e.g. defining a class where a simple anonymous function would do). Or perhaps you're thinking of poorly-thought-out abstractions, i.e. abstraction done badly is a loss.

IOW, your anti-abstraction kick may actually be an anti-bad-language or anti-bad-design kick. If so, I can agree with it (properly restated), but otherwise, it seems misguided, at least without a much clearer treatment of the issues, examples thereof, and analysis of alternatives that might address the issues without canning abstraction.

By Anton van Straaten at Mon, 2006-05-22 16:49 | login or register to post comments

Language designer, know thy user

A function is an abstraction. Defining a variable is an abstraction. These things can be a loss, but most of the time, applied with a minimal amount of savvy, they're not, otherwise we'd all be programming in machine code or some programming language that deprecates naming.

Or languages without higher-order functions or macros. Wait - that is what we're all programming in. I am just trying to offer a more nuanced explanation of this phenomenon than the mantra of "Java programmers are stupid".

By Jonathan Edwards at Mon, 2006-05-22 19:13 | login or register to post comments

Remember the Template Method and the Strategy patterns

Java isn't entirely without anything like higher-order functions. In my experience, the Template Method pattern isn't too bad for emulating HOFs. The same goes for the Strategy pattern for emulating lambdas.

What Java lacks is syntactically light ways to combine little abstractions.

By Vesa Karvonen at Mon, 2006-05-22 19:28 | login or register to post comments

HOFs are less abstract?

Java's anonymous inner classes provide the equivalent of higher order functions (or sets thereof). They're still not exactly syntactically light, but I think the interesting point was that Java wasn't able to get away without them.

Actually, one could argue (a bit perversely perhaps) that without HOFs, requiring a separate class to be defined and instantiated in order to fake HOFs is an example of over-abstraction. Anonymous HOFs thus allow you to reduce the abstraction level, so Jonathan should be promoting them to the abstraction-wary.

By Anton van Straaten at Mon, 2006-05-22 20:28 | login or register to post comments

Know thy user? If only!

Based on my own career on the non-higher-order side of the programming fence, it seems to me that what we've been seeing over the last thirty years is just an incredibly slow movement towards a grudging acceptance that some of these non-obvious forms of abstraction actually have their uses even in mainstream languages, or would if the designers would let them in. The idea that we have what we currently have because that's what users are best suited to completely ignores the history of how we got here, and trends in language development.

I don't think Java programmers are stupid, but I do think that the effect of the language designer's instinct to be conservative and restrictive in the perceived interests of their user communities has frequently bordered on stupidity. Although the underlying causes are not always intransigence on the part of language designers, the net effect is often that of appearing not to "know thy user" very well at all.

These decisions simply force user communities to work around shortcomings at great effort and cost. Features get clumsily tacked onto languages to deal with these shortcomings: for example, inner classes were added to Java because it turns out that hey, having something approximating higher-order functions really is important, even if you just want to call them "callbacks". Code generators make up for lack of macros - the point is not whether macros are part of the language, but rather that the abstractions happen anyway: people are writing things in languages other than Java - e.g. XML dialects or other DSLs - and then generating Java code from that.

Of course it's possible to over-abstract, or choose the wrong abstractions for a problem. That just means you're doing abstraction incorrectly. It's also true that people who experience difficulty with abstraction might be better able to achieve what they want through manual repetition than through a more factored design. But my experience is that such people aren't the ones who succeed in designing systems that survive - their systems are the very systems that collapse under their own weight, becoming unmaintainable as they respond to evolving requirements.

I think it's quite likely that tools which cater to the cut-and-paste instinct, but are smarter than the human user about what they do under the hood, will be useful. And maybe they'll one day get to the point where they can compete with humans who are good at abstraction. But I suspect that'll be found to be a pretty hard problem, because after all, what you're doing is trying to create a machine that's better at abstraction than a human. That sounds suspiciously close to AI.

I agree that nuance in this area is essential, because there are so many interrelated factors, social, technical, infrastructural etc. For me, this means that even a statement like "Verbose repetitive code that is concrete and clear can be a win in practice" requires a lot of nuance to back it up: "can be" under what circumstances? What kind of programmers are involved? What kind of systems? Could it be refactored by someone who can make the result maintainable even by less abstraction-oriented programmers? Why isn't the repetitiveness or verbosity a problem from a maintenance perspective? Is the problem that the resources don't exist to get the system designed and rewritten in a more maintainable way?

Anyone who's worked with real systems of any significance has had to grapple with things like this quite directly, and I doubt that many of them would extol the virtues of "verbose repetitive code". It would be nice to see examples of the sort of thing you're thinking of, though, since this discussion is just a little too... abstract. :)

By Anton van Straaten at Mon, 2006-05-22 20:20 | login or register to post comments

Or languages without

Or languages without higher-order functions or macros. Wait - that is what we're all programming in. I am just trying to offer a more nuanced explanation of this phenomenon than the mantra of "Java programmers are stupid".

As Anton already claimed, I don't believe this is actually correct. Java has anonymous inner classes (a hack, an attempt to make up) and C# 2.0 has anonymous delegates (less of a hack than anonymous inner classes, but still pretty messy) and when C# 3.0 comes out, it will have type inferred lambda expressions (much better syntax than the anonymous delegates). Also code generators are getting to be pretty popular in both Java and C#. Think about this:

These decisions simply force user communities to work around shortcomings at great effort and cost. Features get clumsily tacked onto languages to deal with these shortcomings: for example, inner classes were added to Java because it turns out that hey, having something approximating higher-order functions really is important, even if you just want to call them "callbacks". Code generators make up for lack of macros - the point is not whether macros are part of the language, but rather that the abstractions happen anyway: people are writing things in languages other than Java - e.g. XML dialects or other DSLs - and then generating Java code from that. â€“Anton van Straaten

Programmers end up â€œrediscoveringâ€ the concepts behind the abstractions you mention (macros and higher order functions) and then end up having to hack their way around the lack of support for these concepts in their language. Isnâ€™t that remarkable? Indeed, more than remarkable: it's insane. (Edit: By insane I don't mean it's insane that Java or C# programmers write code generators, I mean it's insane that programmers end up having to rediscover these concepts because the language forbids the creation of said abstractions on the grounds that theyâ€™re too dangerous, yet the resulting code generators are at least as dangerous and certainly more buggy.)

â€œMacros are too dangerous. We canâ€™t include support for macros or else careless programmers will make use of them and make a mess.â€ the language designers say. Yet that gives programmers no other option than to roll their own when they rule it necessary (regardless of the justification of said judgment). â€œThe abstractions happen anyway.â€ But, this introduces an additional point of failure: the generator itself could break (a macro system could break too, but since it is part of the language, that is much less likely).

By Benjamin D. Cutler at Fri, 2006-05-26 19:13 | login or register to post comments

monday

One issue I can think of is utilization of abstractions for real world(tm) entities. The problem is that in the real world(tm), definitions that people made just don't stick sometimes (understatement?), unlike in math. Thus, a beautiful abstraction one made earlier needs to be deconstructed because someone (usually from the management/business side of things) wants to change the definition now (they never invent new names for some reason).

By Koray Can at Mon, 2006-05-22 19:25 | login or register to post comments

Language of the Academy of Lagado

Funny, on the bus back to home from work, I was having thoughts very much like Anton van Straaten. Indeed, abstraction is largely about giving names to things. In programming, you create a computational abstraction so that you don't have to think about the details of implementing the computational process when you want to refer to it.

If you really think abstraction is a bad idea, you should start with something like the SK-combinators and add only the permanent concepts as combinators that you want.

More seriously, I think that you might want to look at the use of fibration in ML-like module systems. I'm not into subtext, but the little I know reminds me of (some sort of ad hoc use of) fibration. See the book Advanced Topics in Types and Programming Languages.

Back to abstraction. You say that most of programming is "ad hoc, local, and impermanent". I believe I understand what you mean, but I think that it is also misguided to use that as an excuse to avoid creating new abstractions. In my experience, the best kind of abstractions form domain specific languages that allow one to specify ad hoc constructions of some kind concisely without mentioning implementation details. In other words, and in my opinion, programs should be factored in such a way that the specific (or concrete) and the general (or abstract) are clearly separated from each other. When programs are factored that way they read like specifications.

By Vesa Karvonen at Mon, 2006-05-22 19:17 | login or register to post comments

Antidisabstractionism

In my experience, the best kind of abstractions form domain specific languages that allow one to specify ad hoc constructions of some kind concisely without mentioning implementation details.

I completely agree. I am not anti-abstraction. I am anti-abstraction-worship. There are pragmatic reasons why copy & paste is appropriate when abstraction would be premature or overkill. I would like to make abstraction easier by allowing one to refactor into it in small steps from webs of ad hoc copy & paste. Also to allow abstraction without losing the concrete (via essentially edit-time inlining).

Thanks for the tip on fibrations. I think it is very telling that module systems often lead to different language constructs than functions. I think that is a bug.

By Jonathan Edwards at Mon, 2006-05-22 19:47 | login or register to post comments

False Dichotomies-R-Us

If you really think abstraction is a bad idea, you should start with something like the SK-combinators and add only the permanent concepts as combinators that you want.

Back to abstraction. You say that most of programming is "ad hoc, local, and impermanent". I believe I understand what you mean, but I think that it is also misguided to use that as an excuse to avoid creating new abstractions.

Since we're going down this path, can I assume that your code is so overabstracted with indirection and and/or inheritance hierarchies that it's next to impossible to figure out what's going on?

Probably not, so enough with the false dichotomy. Things in the real world are a little bit more subtle than that. And nobody has code that C&P free or at least couldn't be refactored over and over again.

By Dave Lopez at Tue, 2006-05-23 01:59 | login or register to post comments

User login

Navigation

Ethnographic Study of Copy and Paste Programming Practices in OOPL

Comment viewing options

Browse archives

Active forum topics

New forum topics

Recent comments