Lambda the Ultimate

The Art of Unix Programming
started 1/21/2004; 9:14:22 AM - last post 1/23/2004; 8:26:58 PM

Luke Gorrie - The Art of Unix Programming

1/21/2004; 9:14:22 AM (reads: 11530, responses: 11)

The Art of Unix Programming

Eric S. Raymond's new book on software engineering from the Unix perspective (past and present). This is an excellent book and very broad in scope. It has much to interest programmers in general and LtU readers in particular.

Some chapters have obvious LtU-appeal. Minilanguages, reminds us of the Unix (Bell Labs?) school of language design and the many DSLs that we take for granted today. Languages discusses popular alternatives for Unix programming and gives specific evaluations of the languages themselves and of their popularity and areas of application.

More subtle connections can be found throughout the text. I found the description of qualities like transparency and discoverability particularly insightful because they capture in essense what I enjoy so much about using certain programming languages. I suspect they also hint at what I miss the most in studying the formal and semantics-focused areas of programming language research.

The section on Multiprogramming also warmed my Erlang-programmer heart.

I bought this book after reading Ralph Johnson's recommendation.

Posted to Software-Eng by Luke Gorrie on 1/21/04; 9:36:33 AM

Dominic Fox - Re: The Art of Unix Programming

1/21/2004; 9:45:08 AM (reads: 901, responses: 0)

I read this recently, with much enjoyment. It's highly informative throughout, and mildly controversial in places, which qualities together ought to provoke a certain amount of informed debate...

Ehud Lamm - Re: The Art of Unix Programming

1/21/2004; 11:15:19 AM (reads: 876, responses: 0)

Related.

John Fraser - Re: The Art of Unix Programming

1/21/2004; 6:18:06 PM (reads: 773, responses: 0)

This is a great book. I kept thinking about the similarities of Unix and Lisp as I read the book. Specifically, "Sequences as Conventional Interfaces" from chapter 2 of SICP came to mind a lot. It seems as though Lisp (and functional languages in general) and Unix have some things in common.

Mark Evans - Re: The Art of Unix Programming

1/21/2004; 6:23:52 PM (reads: 769, responses: 0)

Raymond pontificates elsewhere about his reluctant migration from Perl to Python as his language of first choice. Given such rationality it seems strange that he would mount a defense of Unix cruft.

Pseudonym - Re: The Art of Unix Programming

1/21/2004; 7:07:59 PM (reads: 755, responses: 0)

I found the section on multithreading annoying when I read it last year. He's absolutely correct that there are a lot of novice programmers who are using threads when they shouldn't be (i.e. when another solution such as event loops, multiple processes or coroutines would be better), and it is true that Unix programmers don't have a history of using threads, but it's simply incorrect to call them "threat or menace".

For example: "Threads are a fertile source of bugs because they can too easily know too much about each others' internal states. There is no automatic encapsulation, as there would be between processes with separate address spaces that must do explicit IPC to communicate." LtU readers may smile knowingly if they wish.

Maybe I'm biassed. I hack database servers for a living, which is hardly the "common case" when it comes to concurrency issues. I find that there are two possible reasons why thread programming is hard: Either it's overkill for your problem (ESR's contention) or the concurrency problem that you're trying to solve is inherently hard, and the difficulty of the solution matches the difficulty of the problem.

Luke Gorrie - Re: The Art of Unix Programming

1/22/2004; 4:38:17 AM (reads: 667, responses: 0)

But by what criteria do you decide when threads are the right solution?

Shared-memory threads are overly encouraged by a lot of popular programming languages. I think that most programs choose threads for convenience (the language includes support, other code uses them) and not because isolation with explicit IPC (as in Unix and Erlang processes) is fundamentally undesirable or impractical (on the contrary!).

I've been doing a fair bit of threads-hacking recently myself, even though I would much prefer not to. But I'm working with one of the many languages that have decided to throw their lot in with threads, so I use them partly because there is no convenient alternative but mostly because everyone else is using them and I have to interoperate. It seems to me that when a language takes threads as its means of concurrency then soon enough everybody has to worry about them.

I hope language designers nowadays will pick up the gauntlet Joe Armstrong threw down in his LL2 talk rather than just throwing in threads as a checklist-item. I'm especially curious to see how Arc will deal with concurrency too.

Jonathan Feinberg - Re: The Art of Unix Programming

1/22/2004; 4:46:10 AM (reads: 654, responses: 1)

I was quite eager to read this book, but I was rather disappointed with it. His belligerent and self-righteous tone made the book hard to bear. If the book had seen a decent editor it would have been 2/3 the length and twice the book!

But I love Unix, and I agree with many of the design principles espoused in the book.

Luke Gorrie - Re: The Art of Unix Programming

1/22/2004; 5:11:35 AM (reads: 654, responses: 0)

I was quite eager to read this book, but I was rather disappointed with it. His belligerent and self-righteous tone made the book hard to bear.

Being familiar with ESR's style I braced myself for this at the outset. It helped to have been prepared by reading a good summary of the more grating aspects beforehand. People who doubt their stamina may be advised to skip the "Operating System Comparison" part.

In spite of this it is still an absolutely excellent book in my opinion.

Pseudonym - Re: The Art of Unix Programming

1/22/2004; 10:03:47 PM (reads: 483, responses: 0)

But by what criteria do you decide when threads are the right solution?

At the risk of sounding obvious, using the same critera as you use to make any other software design decision.

You think hard about the problem, about the problem domain, about the code that you already have, about the language you're implementing it in and its features and idioms, you discuss it around a whiteboard with colleagues, you read the latest research on the topic, you propose thought experiments, maybe run some actual experiments... but most of all, you use your experience and judgement as a professional software developer. Then you profile your code and see if you were right.

As an example, you noted that Erlang positively encourages use of what it calls "processes", but in the absence of a distributed architecture are actually threads. It's one of Erlang's primary encapsulation mechanisms. It follows that in Erlang, threads are part of "the right solution" to many problems.

Having said that, how useful threads are compared with separate processes with IPC largely depends on how closely the processes need to cooperate. As I said, I work on database servers, where maximising concurrency often pays big, even at the cost of complex and hard-to-verify locking protocols. Locking individual disk pages is par for the course. Thankfully, database research is one of those areas which attracts a lot of grant money, so we have a large body of smart (and not-so-smart) researchers working on the problems, so we also have a lot of "been there, done that" research to consult.

Luke Gorrie - Re: The Art of Unix Programming

1/23/2004; 11:41:36 AM (reads: 381, responses: 0)

How does one argue with such a well-reasoned statement as that?

The distinction I have in mind is that a "processes" have their own private 'address spaces' whereas "threads" execute concurrently in a shared address space. Processes communicate only explicit IPC, whereas threads communicate implicitly by changing shared data using custom locking/synchronization.

In this sense Erlang and Unix have "processes". Multiple /bin/sh processes don't have to worry about accidentally clashing when updating internal state, and nor do Erlang processes. The IPC protocols have to be correct but everything else can totally ignore concurrency.

Threads have the added complexity we all know about. Suddenly any piece of non-reentrant code in a library anywhere can lead to a week of debugging, "thread-safety audits" of the codebase are needed, and lots of non-locality is introduced by implicit locking contexts and so on. This is much more work, and I think we want to avoid it if we possibly can!

My complaint is that often one ends up dealing with threads by default, not because of any special need. You can't in general ignore threads in languages like Java and Lisp because other people are using them and you have to interoperate. They are probably using them not because shared-memory efficiency is absolutely critical to them, but because they want to do two things at once and threads are the idiom for that. It's just surrender to the threads-monster.

(I am sorry to go on and on! I have recently written some Lisp code that has to interoperate with the non-standard thread abstractions of six different implementations. Currently it is known to work correctly in one. I hope to have that up to two or three over the weekend.)

P.S., the database I use is written in Erlang with processes. :-)

Pseudonym - Re: The Art of Unix Programming

1/23/2004; 8:26:58 PM (reads: 320, responses: 0)

How does one argue with such a well-reasoned statement as that?

People tell me I have this ability to write content-free statements that everyone can agree with. I may have missed my calling in politics. :-)

The distinction I have in mind is that a "processes" have their own private 'address spaces' whereas "threads" execute concurrently in a shared address space. Processes communicate only explicit IPC, whereas threads communicate implicitly by changing shared data using custom locking/synchronization.

Fair enough. We were talking Unix, so I was using the Unix definition of "processes" and "threads".

What complicates things a little, of course, is that there is a continuum between these because different languages have different ideas as to what constitutes "state". Erlang has the assumption built deep into the language that every "process" has its own address space, which has both benefits and drawbacks compared with other approaches. In a language like Haskell or Clean, concurrency gives you a different set of points on the performance tradeoff curves (IPC corresponds to pointer copying which theoretically might make it cheaper, and you lose the ability to garbage collect threads separately, to pick but two), but with similar safety benefits over Java-style threading.

To paraphrase Peter van Roy in one of his guest blogs for this site: ease, concurrency or mutable state, pick any two.