Lambda the Ultimate

inactiveTopic Darcs
started 2/6/2004; 12:02:21 PM - last post 2/9/2004; 1:46:07 PM
Bryn Keller - Darcs  blueArrow
2/6/2004; 12:02:21 PM (reads: 9767, responses: 19)
Darcs
Not strictly about programming languages, but I thought I'd point out David Roundy's "darcs" (David's Advanced Revision Control System), since we've been talking about collaboration. It's basically a CVS replacement, written in Haskell, but the interesting thing about it is the theory of patches that David created. The blurb on that bit is below.

One of the things that interested me was the token-rename patch. It's a patch that says something like 'change all the names "Foo" to "Bar" in this section of code', so the patch includes semantic information, not just a textual diff.

I've been using automatic refactoring with IntelliJ IDEA for a while now, so the first thing that struck me was that revision control systems of the future might in fact deal more with refactorings than with diffs. I can imagine a patch like "Perform Extract Method on method Foo from line 12 to 30, and rename the new 'bar' paramter to 'baz'." Food for thought.

Theory of patches The development of a simplified theory of patches is what originally motivated me to create darcs. This patch formalism means that darcs patches have a set of properties, which make possible manipulations that couldn't be done in other revision control systems. First, every patch is invertible. Secondly, sequential patches (i.e. patches that are created in sequence, one after the other) can be reordered, although this reordering can fail, which means the second patch is dependent on the first. Thirdly, patches which are in parallel (i.e. both patches were created by modifying identical trees) can be merged, and the result of a set of merges is independent of the order in which the merges are performed. This last property is critical to darcs' philosophy, as it means that a particular version of a source tree is fully defined by the list of patches that are in it. i.e. there is no issue regarding the order in which merges are performed. For a more thorough discussion of darcs' theory of patches, see Appendix A.

Posted to Software-Eng by Bryn Keller on 2/6/04; 12:06:41 PM

Olivier Lefevre - Re: Darcs  blueArrow
2/6/2004; 12:30:05 PM (reads: 732, responses: 0)
Yes, this caught my attention, too. Comparing SCSs and databases, it is clear that SCSs are still in the pre-relational stage, so to speak, i.e., they suffer from a lack of a proper model. Does anyone know of other papers or software in this line of work?

And formal languages to describe refactorings?

Patrick Logan - Re: Darcs  blueArrow
2/6/2004; 2:05:00 PM (reads: 707, responses: 2)
the first thing that struck me was that revision control systems of the future might in fact deal more with refactorings than with diffs

This statement confuses changing a system by "refactoring" and changing a system to add or modify the behavior.

Most changes to a system are the latter, rather than the former. And so you would need a vocabulary to describe adds, removes, and deletes.

The more powerful idea, I think, would be to record changes to a system and then using those recordings to modify not just a replication of the system, but modify a dependent of that system.

Let's say the system under control is a "framework" of some kind. And say I have an application that uses the framework. What if the changes made to the next version of the framework could be used to generate and apply a similar set of changes to my application? (Flagging any differences that could not be automated or in some way requiring a judgement.)

Bryn Keller - Re: Darcs  blueArrow
2/6/2004; 2:34:32 PM (reads: 701, responses: 1)
This statement confuses changing a system by "refactoring" and changing a system to add or modify the behavior.

Fair enough, how about "program transformations" or something of the sort? Mostly what I'm trying to get at is that our version control systems are working purely at a textual level, which is pretty crude.

I'm interested in what happens when we trade in our textual model of source code for a structural one. Macros in C are to current version control systems as Scheme macros are to the system I'm trying to talk about.

So, for example, you don't just "add text", you add a method, a class, a package, a widget. Each patch would describe a program transformation of some kind, rather than a text cut/paste.

The more powerful idea, I think, would be to record changes to a system and then using those recordings to modify not just a replication of the system, but modify a dependent of that system.

Yes indeed. I believe darcs already supports this to some degree because it tracks patch dependencies. To be completely effective though, we have to model patches as program transformations. Then it can actually make sense to play those recordings back in some context other than the one they were originally used it.

Daniel Yokomiso - Re: Darcs  blueArrow
2/6/2004; 3:58:43 PM (reads: 680, responses: 0)
So, for example, you don't just "add text", you add a method, a class, a package, a widget. Each patch would describe a program transformation of some kind, rather than a text cut/paste.
These come to mind...

Program Restructuring to Introduce Design Patterns (1998)
http://citeseer.nj.nec.com/cinneide98program.html

Composite Refactorings for Java Programs (2000)
http://citeseer.nj.nec.com/cinneide00composite.html

Automated Application of Design Patterns to Legacy Code (1999)
http://citeseer.nj.nec.com/316105.html

Avi Bryant - Re: Darcs  blueArrow
2/6/2004; 5:16:07 PM (reads: 655, responses: 2)
So, for example, you don't just "add text", you add a method, a class, a package, a widget. Each patch would describe a program transformation of some kind, rather than a text cut/paste.

Most revision control systems for Smalltalk, including Monticello, which I wrote for Squeak, do exactly this: they model source code as a collection of packages, classes, and methods, and model changes as additions/removals/updates of these elements. Among other things, this makes merge and diff algorithms both trivial and robust.

This is hardly news; what's amazing is that most of the rest of the world (VisualAge for Java being a notable exception) doesn't do the same thing. As Kent Beck says, "source code in files - how quaint".

Bryn Keller - Re: Darcs  blueArrow
2/6/2004; 5:59:00 PM (reads: 670, responses: 1)
This is hardly news; what's amazing is that most of the rest of the world (VisualAge for Java being a notable exception) doesn't do the same thing. As Kent Beck says, "source code in files - how quaint".

Somehow I knew this quote would come up, but unless I'm greatly underestimating the state of the art in Smalltalk, it doesn't apply here. I picked "adding a method" as a simple example, but I intended to imply a wide range of program modifications. I'm talking about expressing every change to source code as a program transformation.

I understand Smalltalk keeps track of things at the level of methods, classes, and packages, rather than files, but I don't think (modulo my limited exposure to Smalltalk) that it does what I'm talking about.

Here's a clearer example (hopefully): Suppose I tell my editor to do something to my code, for example, "surround this code here with a try/catch block". I'm not asking it to figure out the intended semantics of my typing into the buffer, let's say I do it with a menu action so that it can tell what my intent is.

Now, when I commit this source back to the revision system, I don't want it to record my changes by way of diffing. I want it to apply the same source transformation to its source that I performed on my copy. The instruction to the version control system is

"surround lines 12-14 with a try block, catch block is as follows:...",

and not "insert text 'try {' at line 12" "insert text '} catch (Exception e) { ... }' at line 15".

Do any of the systems you referred to do this? Does anyone know of any such system?

andrew cooke - Re: Darcs  blueArrow
2/6/2004; 6:19:43 PM (reads: 668, responses: 0)
this sounds equivalent to having a programmable refactoring editor - you're saving code changes as instructions for the editor. since refactoring editors aren't that great yet (afaik ;o) presumably it doesn't exist.

i was checking out "hare" (http://www.cs.kent.ac.uk/projects/refactor-fp/hare.html) earlier this evening to see if it would turn a data type into a class. it's on their wishlist, but not implemented, so i've spent the evening rewriting. makes me think that maybe languages should be designed so that certain common refactorings are not that difficult. not sure if this is orthogonal to other design issues or not (it seems somewhat contradictory to the idea that different things should look different).

Avi Bryant - Re: Darcs  blueArrow
2/7/2004; 3:07:08 AM (reads: 604, responses: 0)
Here's a clearer example (hopefully): Suppose I tell my editor to do something to my code, for example, "surround this code here with a try/catch block". I'm not asking it to figure out the intended semantics of my typing into the buffer, let's say I do it with a menu action so that it can tell what my intent is.

Now, when I commit this source back to the revision system, I don't want it to record my changes by way of diffing. I want it to apply the same source transformation to its source that I performed on my copy.

There are definitely some interesting possibilities there. It would be great to do a "rename method" refactoring on one branch, merge it with another branch that has additional calls to the old method name, and have those all get caught and changed automatically.

And, no, I certainly don't know of any systems that currently do this, but you've definitely got me thinking about what I can do with the Refactoring Browser and Monticello.

However, I've found that purely delta-based systems in Smalltalk don't work very well - if all you're doing is recording changes, you really better not miss any, or you're going to get out of sync. And it's pretty hard to guarantee that you've got hooks into every place where someone might change code, and so will in fact see all the deltas. I've had much better success with a model that involves comparing complete snapshots of a package, and moving the system from one snapshot to another.

What might be possible is in this context to have a stack of active refactorings as part of the static state of the system - when updating a system from one state to another, you make sure to apply any new refactorings that have been accumulated. Ordering would be tricky here, though - you would need some way of knowing which changes came before or after a particular refactoring.

It's late, hopefully I'm making some vague sense here...

Paul Snively - Re: Darcs  blueArrow
2/9/2004; 9:17:37 AM (reads: 371, responses: 8)
Avi Bryant: There are definitely some interesting possibilities there. It would be great to do a "rename method" refactoring on one branch, merge it with another branch that has additional calls to the old method name, and have those all get caught and changed automatically.

And, no, I certainly don't know of any systems that currently do this

I believe, but have not tested, that darcs will handle this correctly.

This thread makes it sound as if all version control systems are just using diff/patch under the hood, and only deal with text. This isn't true, and it's particularly untrue of darcs. Another poster asked for a version control "model." That's precisely what darcs provides: it provides a model of a system that is neither more nor less than a set of patches applied to a (potentially empty) tree, and a theory of patches that defines what a "patch" is, and let me tell you, these aren't your grandfather's patches. The appendix defining the theory of patches is a must-read.

Bryn Keller - Re: Darcs  blueArrow
2/9/2004; 9:30:43 AM (reads: 373, responses: 7)
This thread makes it sound as if all version control systems are just using diff/patch under the hood, and only deal with text. This isn't true, and it's particularly untrue of darcs. Another poster asked for a version control "model." That's precisely what darcs provides: it provides a model of a system that is neither more nor less than a set of patches applied to a (potentially empty) tree, and a theory of patches that defines what a "patch" is, and let me tell you, these aren't your grandfather's patches. The appendix defining the theory of patches is a must-read.

But the patches are limited to a) rename-token, b) diff, and c) some merger/combination of smaller patches, right? So the fundamental unit of patching is still diffs (except for rename-token), it's just that we have a useful algebra for combining and manipulating patches. I'm all for it, and I think it's definitely progress, but it doesn't seem to me that this meets the critierion of storing most changes to source as little program transformation expressions. Basically I'm talking about having patches that work at the level of abstract syntax trees rather than text, and I don't see any evidence that darcs does that. The rename-token patch is very similar to the sort of things I'm talking about, though.

Neel Krishnaswami - Re: Darcs  blueArrow
2/9/2004; 1:13:19 PM (reads: 343, responses: 1)
Martin Erwig wrote a short paper, "Programs are Abstract Data Types", in which he proposes that changes to programs should form an algebra like Bryn suggests. It's not a fully worked out theory, but it might be relevant to the discussion here.

Ehud Lamm - Re: Darcs  blueArrow
2/9/2004; 1:46:07 PM (reads: 334, responses: 0)
Sound similar to the ideas I originally had regarding abstraction breaking operators (also not fully worked out, and I think this is mainly beacuse the approach is quite problematic...).

My paper is in LNCS 2043, by the way.

Oleg - Re: Darcs  blueArrow
2/9/2004; 4:20:19 PM (reads: 310, responses: 6)
Basically I'm talking about having patches that work at the level of abstract syntax trees rather than text

But what about the changes that do not affect the abstract syntax?

For example, I prefer the style (let ((var1 val1) (var2 val2)) body) whereas some other people prefer (let ([var1 val1] [var2 val2]) body). Both expressions differ only in concrete syntax. Also, sometimes a piece of code doesn't look nice and has to be reformatted. Looks are important, to me at least. The reformatting doesn't even change the syntax of an expression, merely the amount of white space here and there. How to account for such a change in a system that tracks only AST?

Finally, I do write comments in code. I often start with comments, especially for complex code. Comments need modifications too, e.g., to change their formatting, to add clarification, or note a question to think about. The comments usually appear at the beginning of a file, before a function, and sometimes before a code block (especially before a test alternative). To me, comments are just as important as code. Therefore, a revision-control system that merely tracks changes on the AST level doesn't appear attractive to me. I admit I'm old-fashioned, but to me code is text. I read code the same way I read comments or other scientific text, trying to understand what the text says. IMHO the presentation aspect is important and should be preserved by the system.

Bryn Keller - Re: Darcs  blueArrow
2/9/2004; 4:34:32 PM (reads: 306, responses: 5)
But what about the changes that do not affect the abstract syntax?

I don't mean that the system should only work at an AST level, just that patches should contain as much semantic information as possible, and as little cruft as possible. For instance, if you reformat a file according to some preferences you have, the patch should look like:

patch [ format [ style: scheme [ tab-width: 4, let-style: [inner: paren, outer: paren] ] ] ]

and not a big list of "at line 45 change char 59 from [ to ("

The important thing is that the patch carries more information about the intent. If I had a different preference for formatting than you, I might want to set my scm client to skip any patches from you that change formatting, but keep all the rest.

Finally, I do write comments in code.

Again, I'm not saying that it should work only at the level of the AST, just that patches should be as expressive as possible. Changing comments may well have no (easily expressible) description more useful than a simple text change. Of course we still need to support the simple things as well!

Josh Dybnis - Re: Darcs  blueArrow
2/9/2004; 5:05:25 PM (reads: 308, responses: 2)
Now that someone else has cast the first stone, I'll comment too ;)

I don't mean that the system should only work at an AST level, just that patches should contain as much semantic information as possible, and as little cruft as possible.

What does that get us? Smaller patch files? More flexibility in the order patches can be applied? It sure makes the system a lot more complicated. It also needs to be customized for every language use; you lose all the benefits if you create a DSL.

andrew cooke - Re: Darcs  blueArrow
2/9/2004; 6:13:18 PM (reads: 302, responses: 1)
I don't mean that the system should only work at an AST level

why not? isn't the rest just pretty printing?

Bryn Keller - Re: Darcs  blueArrow
2/10/2004; 12:09:53 AM (reads: 290, responses: 1)
Now that someone else has cast the first stone, I'll comment too ;)

Great! I'm just trying to stir some discussion of what the future of source code management might look like.

What does that get us? Smaller patch files? More flexibility in the order patches can be applied?

Sure, those both seem useful to start with. More flexibility in which patches you actually choose to integrate into your version of the tree, perhaps. Maybe other things we haven't thought of yet.

It sure makes the system a lot more complicated. It also needs to be customized for every language use; you lose all the benefits if you create a DSL.

Yes, that's the big drawback, but maybe it's worth it sometimes? Emacs is tremendously capable, and you can edit almost anything with it, but (imho) it doesn't hold a candle to IntelliJ IDEA for editing Java in particular. Maybe there's a place for language-specific source code management as well.

Thoughts?

Bryn Keller - Re: Darcs  blueArrow
2/10/2004; 12:15:48 AM (reads: 284, responses: 0)
why not? isn't the rest just pretty printing?

Sure, there is that, but some folks don't see it that way. There's also the related question of what to do about sytactically incorrect fragments of code. To allow them, we have to allow plain text in addition to AST. Maybe we can say that syntactically incorrect code can never ever be checked in, I don't know. How much leeway do we actually need?

Ehud Lamm - Re: Darcs  blueArrow
2/10/2004; 12:53:25 AM (reads: 288, responses: 0)
I don't really see any advantage at this point.

Now maybe when we have languages that include source-code management constructs (or, rather, program configuartion mngmt constructs) as part of the language this will start to make more sense.