Influence of cognitive models on programming language design

Programming languages are ultimately interfaces for people to interact with the world of computational possibilities. In that backdrop, I've recently been interested in the influence of cognitive models on programming language design and would like to hear the thoughts of the tU community on the topic.

I think the recent rise of DSLs warrants more research in this area. Past work has been largely populated by debates on completeness, compiling vs. interpreting, efficiency, dynamic vs. static, typed vs. untyped, parallel vs. sequential, distribution, designer idiosyncrasies and, not to mention, elegant vs. ugly.

While I don't deny the importance of the debates, the only one that comes anywhere near the territory of cognitive models is this notion of "elegance", but that's (as far as I know) never dealt with formally.

I see a few areas of work here -

  1. Bringing schemes from models of brain function and cognition to programming languages.
  2. Using findings of cognitive psychology in designing aspects of programming languages
  3. Studying how we model the languages that we use and "like" - i.e. cognitive modeling of programming languages.

(Maybe others strike you.)

For an example of (1), Drescher's schema building mechanism seems a very interesting angle to building interfaces to the world of computation. (see Made-up Minds) .. and it does look like people are trying to apply that approach to special areas. Production rules (grammars) are another category. Of course, one can't forget Prolog.

AppleScript and Hypercard are simple examples of (2), 'cos they were intended to be usable by non-computer science folk and yet are powerful enough scripting languages. Elements of the language such as "tell ...", and "it" exploit our ability to refer to "at hand" and relative entities without much effort. The Logo world and, particularly, Scratch must be mentioned too. I myself got interested in this topic after I worked on adding "the" and "it" to my own scheme toy and found some cognitive modeling results such as working memory size useful in determining usage boundaries.

(3) is interesting as part of the design iteration loop.

Thots?

---
Link summary (extracted from posts) -

  1. Chris Barker
  2. Inform (language for interactive fiction)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Social models more relevant

While programming may be a mental exercise, it is rarely an exercise completed by individual programmers on any non-trivial project. More important, I suspect, is designing to integrate the efforts of, say, ten-thousand programmers working on hundreds of projects that integrate in ad-hoc manners.

When dealing with large groups of programmers and large codebases, it shouldn't be an error to model individual programmers as having narrowly scoped goals, very limited foresight and knowledge of the codebase, and a severe distaste for reading or modifying existing code except as necessary to tweak it for their own work.

What language features would be useful in that hugely scaled environment? That's the question I think we should be answering. Because, whether we or our languages admit to it or not, that's already the environment in which most of us are working.

I'm laboring on my own answer to the idea of a language scalable to support tens of thousands of concurrent programmers with interdependent projects, and I won't go too much into it here (don't want to sound like a brochure), but I think language and development environment design would do well to learn from existing large-scale projects like Wikipedia.

While programming may be a

While programming may be a mental exercise, it is rarely an exercise completed by individual programmers on any non-trivial project

This is most definitely false. Programming is often performed in small groups of one or a few people, e.g., they are writing entire compilers, kernels, games, and so on mostly by themselves. A lot of non-trivial programs and libraries out there were written by one or a few people. If you want evidence, just look at most successful OOS projects, before they were popular, one or two people were behind them. In this case, programming is the mental exercise that dominates, and process is to the side (though still important).

What you are talking about is software engineering, which tries to scale programming to larger projects and larger groups of people. In this case, process dominates, and programming becomes less important. Of course, when less effort is devoted to programming, less gets done, but what else can you do on large projects?

I would argue that MOST of the programming performed is done by very small groups (often just one person), and therefore, studying programming in isolation of larger-scale software engineering is useful. Of course, software engineering is also important and useful to study. But they are really different things...

"Libraries" written by and for just a few people?

A lot of non-trivial programs and libraries out there were written by one or a few people.

If you integrate a library written by other people, your project has now been written by multiple people.

If you integrate an OS written by other people, your project has now been written by multiple people.

If your project uses open standards and protocols designed by many people, your project has now been written by multiple people.

When these libraries, operating systems, and standards are upgraded, it influences the projects that utilize them. When you're looking at programming in the large, it is almost never performed by individuals. Projects are interdependent even when individual programmers have narrowly scoped goals.

Programming is often performed in small groups of one or a few people, e.g., they are writing entire compilers, kernels, games, and so on mostly by themselves. If you want evidence, just look at most successful OOS projects, before they were popular, one or two people were behind them.

I agree that successful projects often begin their existence written by individuals. But you won't find many successful kernels or games that never have more than a few people laboring on all the different pieces of them.

What you are talking about is software engineering, which tries to scale programming to larger projects and larger groups of people.

Well... not really. I don't really believe software engineering is about scale so much as it is about achieving predictable results (in terms of completion, quality, budget, etc.). That concerns of scale get involved with software engineering at all is just a side-effect of realities surrounding all (or nearly all) non-trivial programming projects.

What I'm talking about is social modeling and engineering, and designing a language in such a way that it allows projects to scale and integrate in a self-regulating manner even under the premise that the programmers are for the most part operating independently based on their own goals.

Sometimes the little things matter, such as: Is source code named in hierarchical niches, or is it in a flat namespace? How confident can you feel in someone else's work and can you easily fix or upgrade it if necessary? How are the risks of malicious code injection controlled (security, etc)? Can you trust the optimizer, or will you need to do xyzzy yourself to do it right? If someone else has done what you need, can it be slightly tweaked to make it do what you need with minimal invasive change to the code? How 'fragile' are the language components under refactoring, feature addition, and forward compatibility?

The design decisions we make when designing a language can be targeted towards individuals or towards groups, and it is certainly possible to achieve a balance of the two. Heavy support for extending the language with EBNF macros, for example, is something that favors individuals in the small scale, but groups could take advantage of it by standardizing and agreeing upon a few DSLs for, say, describing interactive scenegraphs for an object browser (e.g. something akin to Inform language).

I would argue that MOST of the programming performed is done by very small groups (often just one person), and therefore, studying programming in isolation of larger-scale software engineering is useful.

I would argue that MOST of programming involves combining stuff written by other people and, therefore, studying programming with the idea of mashups and project composition firmly in mind is even more useful.

I'm not talking about disabling the individual programmer. The sort of social engineering decisions above would effectively allow individual programmers to be highly productive in spite of having narrowly scoped goals, limited foresight and knowledge of the codebase, and a distaste for learning more.

Indeed, such individuals may 'feel' they completed useful projects on their own... but that would require the (erroneous) perspective that the libraries, systems, standards, and protocols they utilized were given to them by mother nature.

When these libraries,

When these libraries, operating systems, and standards are upgraded, it influences the projects that utilize them. When you're looking at programming in the large, it is almost never performed by individuals. Projects are interdependent even when individual programmers have narrowly scoped goals.

Most programming tasks aren't so glamorous. You produce something that someone will never reuse, just for some task at hand. When you look at programming in terms of population and tasks, I'm sure that more solitary programming dominates. Even when you look at most programs out there, they were written initially by one or two people: Linux, Emacs, OmniGraffle, compilers for Ruby, Python, even Javac was written by a small team. There are very few projects out there with more than 10 programmers (say Windows) and when projects beyond a couple of programmers grow process dominates, and actually most of the people involved in the project aren't contributing much code.

I agree that successful projects often begin their existence written by individuals. But you won't find many successful kernels or games that never have more than a few people laboring on all the different pieces of them.

Yes, plenty. World of Goo comes to mind as a game, OmniGraffle comes to mind as an office-style application, kernels are a bit more difficult if you include drivers, but if you are just talking about the core kernel, I don't think the team of direct contributors to Linux is very big. Anyways, go through the applications that you use daily and I'm sure you'll find lots of examples.

What I'm talking about is social modeling and engineering, and designing a language in such a way that it allows projects to scale and integrate in a self-regulating manner even under the premise that the programmers are for the most part operating independently based on their own goals.

Fair enough. My counter-argument is that you might miss because the problems with most large-scale projects don't occur at the programming level, rather they suffer from problems in project management, design requirements, communication, integration, and so on...I don't see how you can shove those things in a programming language, but if you could, that would be great.

I would argue that MOST of programming involves combining stuff written by other people and, therefore, studying programming with the idea of mashups and project composition firmly in mind is even more useful.

Ah, we all agree reuse and components are good. But the problem you are talking about is much harder: how do you get multiple programmers to work closely together on a project. Its easy enough to produce a library when the line of communication, coordination, collaboration between the producer and user is almost nil: the library better be well designed or the user will just skip it. There isn't much of a social process there.

Anyways, I see what you are saying and wish you luck in looking at this. I'm just arguing against your premises, half the battle in research is just figuring out what the problem really is.

Most programming tasks

Most programming tasks aren't so glamorous. You produce something that someone will never reuse, just for some task at hand.

Agreed. But it is exactly in those situations where 'you' become the person reusing other things in order to accomplish the task at hand. Integrating components written by other people is a non-trivial, and ultimately social, task.

By no means was World of Goo developed by just three people... not when you include the components, documentation, service, integration, etc. associated with use of Simple DirectMedial Layer, Open Dynamics Engine, XML and bug trackers and version control, font generation, etc.

problems with most large-scale projects don't occur at the programming level, rather they suffer from problems in project management, design requirements, communication, integration, and so on...

Well, I wouldn't say that problems with large-scale projects don't occur at the programming level. And I'd certainly consider 'integration' to be a programming-level problem. But I readily agree that solving just programming level problems would be far from sufficient to solve all problems.

For technology-based support of software engineering, I think the better approach is to focus on enhancing features of the integrated development environment (including source version control, bug-tracking, feature-requests/user-stories, configuration management, quality assurance and testing, etc.). The enhancements or properties needed often don't make sense in the language proper, excepting that it'd be nice if all the IDE features were available in libraries.

In the broader sense, more than just project management could be managed by such IDEs. One could support an economic model with technology as well... e.g. allowing people to pool or auction rewards and incentives towards achieving new features or greater optimizations and integration of open source technologies, and offering clouds or grids in which one can run active services (e.g. video game servers) at competitive rates.

But the problem you are talking about is much harder: how do you get multiple programmers to work closely together on a project.

That's not the problem I'm concerned with. Small cliques of programmers will form naturally without any special language or IDE design efforts. Besides, if I wanted to 'force' people to 'work closely' with one another, I'd actually have a reverse policy... similar to how locking files in a version control system can 'force' you to go meet people you'd otherwise rarely go see, simply because they forgot to unlock a file.

Rather, the important question is a slight (but significant) variation: how do I support integration and enhancement in a system where individuals and small cliques are mostly acting on their own individual interests? I.e. what I want is cooperation as an emergent behavior, not as a required one.

I do have answers (maybe even correct ones) to this question, both for integrating and enhancing active services (which supports closed source services) and supporting cross-project integration, enhancement, and refactoring work in an open source arena. While I don't assume malice is common (I couldn't find answers for a largely byzantine environment) I also have answers for many relevant security, resource management, and privacy concerns.

half the battle in research is just figuring out what the problem really is

Indeed. I've been working on it for seven years now. I can't say with certainty I have the problem right, but I can say I've given it a lot of thought, observation, hypothesis and testing. I'm reasonably confident I have the problem right, and that the problem is not in any significant sense cognitive models in the small scale of individual programmers.

"How well does the language support integrating stuff other people write?" is, I believe, a far more relevant question than "How well does the language support the individual mind?" Frameworks and libraries won't take you very far if it becomes a combinatorial hassle to put more than two of them together (e.g. due to mutex management, diverse memory management, safety issues, inability to tweak them a bit them prior to integration, etc.).

what I want is cooperation

what I want is cooperation as an emergent behavior

It sounds like you're talking about cooperation in the positive sense - that people accommodate each other, work with each other, etc. It might be worth expanding that to *all* co-operation to include disagreements, fist fights, whatever.

(a) "How well does the language support integrating stuff other people write?" is, I believe, a far more relevant question than (b) "How well does the language support the individual mind?"

I've not thought much on this topic, but my gut feel is that the two aspects you mention are intertwined - in ways that we probably don't understand quite well yet. For example, it seems to make sense to ask the question "how does the language support *me* integrating stuff that I developed in the past?". Is that (a) or (b)? Design-wise, it probably has the same issues as (a), but cognition-wise it seems more like (b).

It sounds like you're

It sounds like you're talking about cooperation in the positive sense - that people accommodate each other, work with each other, etc. It might be worth expanding that to *all* co-operation to include disagreements, fist fights, whatever.

I would suggest that, as much as possible, the design of the language should be aimed towards positive cooperation and avoiding conflict. But I do agree that support for working in spite of disagreements and resolving disagreements should be included in the language and IDE design.

the two aspects you mention are intertwined

Likely, yes. But the degree to which two questions are intertwined or correlated is also the degree to which answering one question provides also an answer to the other. Thus, the more intertwined, the less any potential penalty for ignoring the other question.

If you integrate a library

If you integrate a library written by other people, your project has now been written by multiple people.

If you integrate an OS written by other people, your project has now been written by multiple people.

If your project uses open standards and protocols designed by many people, your project has now been written by multiple people.

and if I might add - you're probably using a programming language designed by other people :)

Indeed.

And that language is also among those things that will change over time.

Re "mother nature" and "combining stuff"

To spoil an old joke, there's a great difference between "standing on the shoulders of giants" and "stepping on each others' toes". It seems that your discussion of "social modeling and engineering" is blurring the distinction between three modes of programming that I experience in different ways:

1) consumer - a "lone wolf" programmer who is (by definition) using languages, libraries, etc. created by others, but who is treating them as artifacts of nature simply to be used as they are, while she/he pursues a private agenda. The interaction pattern is read-only. The self-description might be, "If I find something useful, I'll use it; otherwise, I'll write it myself."

2) producer - a programmer (lone or otherwise) who is creating a language, library, etc. that will / may be used by others, but with an agenda not driven by their requirements/feedback. The interaction pattern is write-only. The self-description might be, "I'm writing this for my purposes. If someone else finds it useful, they're welcome to it; otherwise, they should look elsewhere."

3) collaborator - a programmer who is working concurrently and interactively with other programmers on a project shaped by their varying goals and contributions. The interaction pattern is iterative read/write. The self-description might be, "We're working together -- at least for the moment -- on something that won't happen without the accumulated contributions. We'll have to negotiate and compromise as we go."

I'm not trying to propose a taxonomy, but looking to emphasize the difference between no-man-is-an-island generalities and "real" social and cooperative development, as a matter of the individual's orientation and attitude toward others.

To switch metaphors, I may prepare my solitary breakfast (with the implicit recognition that other people existed, such as the farmer who grew the oats), but that's a radically different process from collectively preparing a family holiday dinner in real-time with a kitchen full of others. (And that, in turn, is different than operating a communal kitchen for a festival event with a population of thousands.) It would seem to me that there's an entire range of issues (packaging, portioning, utensil design, room layout, etc.) that will be different among these extremes.

Are you suggesting that there are language design issues that benefit all of the cases of the "lone wolf" programmer, the small in-house development team, and the loosely-coupled open source project?

First, do no harm.

language design issues that benefit all of the cases of the "lone wolf" programmer, the small in-house development team, and the loosely-coupled open source project?

Well, every language design issue affects people driven by the various incentives you name, and given the range of language design decisions one should be extremely surprised if there weren't some language design decisions (and especially combinations thereof) that benefit all three.

But we shouldn't restrict ourselves to design decisions that benefit all classes of work; it is more that we should find alternatives to combinations of design decisions for which there is justifiable reason to believe it might significantly hinder those operating under any particular incentive.

Anyhow, consider a slight variation in the user stories:

1) consumer - a "lone wolf" programmer who is integrating frameworks, libraries, macro DSLs, etc. into a product (potentially intended to be a one-off product), and who is likely bumping into and resolving issues to make the integration possible. Sometimes, the lone wolf will discover it easier to fix issues in their source rather than work around them in his own (especially if said lone wolf bumps into the problems in more than one project), and so the lone wolf would often like the ability to commit framework/library tweaks (even for one-off products). For long term products, issues of maintenance further motivate the ability to push changes.

This lone wolf also might have no particular concerns about sharing his/her product (i.e. it doesn't contain anything private) especially if it means (a) ability to work on it directly in a public wiki-style repository from any javascript-enabled browser from any network-enabled computer and if (b) there is a culture such that when other producers update the stuff they wrote (e.g. refactoring names, API, generalizing something, etc.) they'll fix the lone wolf's public code too.

Pure self-interest motivates refactoring/tweaking/sharing.

Language design decisions to aide the lone wolf includes especially support (a) for dependency injection, such that the lone wolf can modify a library to change an internal component based on an external provision, then set the default for said component to whatever the original value happened to be, (b) for injecting default parameters, such that the lone wolf can add a new parameter to a function without needing to modify other people's code.

2) producer - a programmer creates a library or framework for his own project, not particularly concerned about how others use, not particularly interested in reading or understanding or learning about existing projects that solve the same goal. Pattern is write-only, reinvention rather than reuse. We can presume the producer often doesn't care whether other people modify the product so long as said modifications don't cause runtime bloat, inefficiencies, or break anything in his own product.

Producers concerned about forwards compatibility of their product among users benefit if they can make changes to, say, the API of their project then simply find all users in the public repository and push changes to them. This would be similar in concept to making a change to GNU Math and pushing the API update through to every user of GNU Math on all of SourceForge. Other people than the producer may also go through and fix such things (wiki-gnoming supported by unit tests?).

Others will tweak for integration, modify, refactor away duplicated efforts (e.g. "yet another implementation of ackermann function" (as a highly contrived example)), etc. If concerned about toes being stepped upon, the producer may choose to keep a private version of the project in a private repository that inherits (in distributed version control fashion) from a public repository. With DVC this allows cherry-picking of changes made to his/her work, and allows pushing of updates he/she makes. If feeling lazy, the producer may simply allow others to pull from his private repository and cherry-pick/push changes to the public one themselves. Maintenance of a private repository is the common case today.

The life of a producer is, thus, no worse than it is today, and is possibly much better... especially so if the library/framework/etc. utilizes other libraries, or if the producer isn't the sort of masochistic genius programmer who has and applies the foresight such that the resulting product is 'perfect'.

As a producer, I would be more willing to have my code modified or references to external code added if the language provides whole-program and partial-evaluation optimizations such that tweaks like adding parameters to functions doesn't bloat the code for fixed parameters, and referencing other pages doesn't introduce need for more separately compiled components. (The benefits associated with separate compilation can readily be achieved by other vectors.) I'm also less likely to have problems if such changes can be guaranteed to not introduce concurrency (deadlock, safety) and security concerns.

3) collaborator - actively communicating programmers aiming to achieve a product that resolves cross-cutting concerns. Example products would be a robot command+control protocol, or a graphics library, or a common scene graph system for a virtual world interconnect. We can presume collaborators include consumers who each have motivations pushing for certain changes or additions, along with producers who actually know enough to really make the changes but lack the resources to try all of them.

Distributed version control and public repositories help a great deal for collaborators that are willing to participate in it. All collaborators are aided if it is easy to branch the whole system, try a change, integrate it, and test the integration. and commit it back if it produces the desired effect and all the unit and integration tests pass. Consumer collaborators thus have an easier time operating as producers of the system, and producer collaborators have greater access to a testbed. If the public repository can include a class of 'tests' that are run every time dependencies change and alert the appropriate individuals of the failure, so much the better.

Language design decisions that aid the collaborators include first-class support for service configurations, such that they can be abstracted, fired up and tested, instantiated with references too the real world to get real services running, etc. in addition to support for confinement so that such tests don't interact with the 'real' world, and support for overriding aspects of processes such that, for example, the state inside a process/actor can be replaced with a reference to a database allowing systematic observations and unit/integration testing.

-------------

Of course, all this doesn't touch on other classes of users such as those with continuously running services, and those who wish to upgrade clients and services at runtime without touching the code with which said clients or services are written. There are language design decisions that help these guys a lot, too, including support for automatic distribution of code (so clients can put agents near services and vice versa), security, recognition of relative levels of distrust, and support for arbitrary distributed transactions (which allows ad-hoc composition of protocols and services without introducing race conditions).

Data, "Scalability"

Here's where there is a lot of room for a curious researcher, in my opinion. You guys have a good back and forth about 'individuals', but wouldn't it be interesting to do a survey to see what sorts of social programming models prevail? You could look at how much code is written by how many people, and in groups, what the distribution is like. You could look at the influence (or rather, correlation) of languages on those numbers.

On language scalability: I think the true measure of a language's "scalability" in most cases is the range it can handle, not the absolute high end. In other words, a language might be well suited to 10,000 person projects, but if it's no good for whipping up something quick, it has a limited range. I'd rather use a language that is good for a quick hack, and still performs ok at the huge project, even if not quite as well as the one that only works for huge projects.

Re: Scalability

Good idea, but the caveat I would add is that looking at what large groups use is naturally going to select against obscure languages and models, even if they are perfectly well suited to large projects involving many developers.

Similarly, PHP is a very popular language for large group projects, but I doubt it really has any real technical advantages in that setting...

I was referring more to

I was referring more to technical aspects rather than 'social' aspects such as how popular a language already is. Something like Java seems to be popular for large projects, and reasonably good at it, in that it enforces some barriers. It's not so good at smaller, nimbler things, and often requires more boilerplate just to get started doing something (at least that has been my experience). Something like Ruby, on the other hand, might be great for smaller projects, and still work ok for bigger ones, although of course if people start doing lots of "monkey patching" type activities, that could quickly sink a larger project in a hurry.

When "whipping up something

When "whipping up something quick" or performing a "quick hack", is it not usually the case that you are, in actuality, composing and enhancing services built already by other people? If so, does that not require scalability of the language, such that hooking together services and frameworks and such to produce new ones can be performed without gotchas or considerable effort?

If it takes a great deal of care or knowledge to combine services and frameworks safely, without deadlock, without error, and without sacrificing performance, then the language is not upwards scalable because composition of services will need to be "shallow" to be successful.

Except in the case where one is building a new project using only the 'core' language features, "Quick hack" projects benefit primarily from the high-end form of scalability. And, while I do believe the ability to write up a quick functions and vocabulary from scratch is useful, I will happily sacrifice it in favor of shaping the language such that the path of least resistance encourages everyone to code in a manner that automatically supports safe, efficient, flexible, and comprehensible project composition.

That said, I don't tolerate boiler-plate. The need for boiler-plate code greatly resists scalability.

Definitely an interesting

Definitely an interesting angle on the problem, though I wouldn't rule out the significance of individual programming effort as a "rare" thing. For one thing, I do a *lot* of "individual programming" :) .. but even there, my style and preferences have been recursively honed by my exposure to the work of the community. So, programmer development as a consequence of programmer-community interaction is probably the way to look at it. Vygotsky's social development theory is relevant I think.

Saying you built your house

Saying you built your house all on your own doesn't give proper credit to those who paved the roads, built your truck, trimmed your lumber, and gave you some power tools. But, also relevantly, it doesn't give the necessary credit to how your needs, and those of others like you, shaped the very industry that gives you lumber and power tools.

People who do "individual programming" are very, very rare. People who think they do "individual programming" are very, very common.

What we call "individual" is usually a social dynamic of "implicit emergent cooperation". Distinguishing between these should be useful when making design decisions for IDEs and languages, especially with regards to sharing access to projects.

???

Forgive my ignorance, but it seems this model relegates "individual programming" not to rarity but full extinction. Is there an example of someone, real or imaginary, who has constructed a program without the assistance of any human, that is, someone who is an "individual programmer" under the given model?

Is there an example of

Is there an example of someone, real or imaginary, who has constructed a program without the assistance of any human

I'd include among such people those that built their own circuit boards. But one could also recognize degrees to which a project revolves around individual programmers in a non-binary sense. I would, to a lesser degree, include those that write assembler, write original drivers, certainly the dude who wrote Synthesis OS.

To an even lesser degree, I would include those who use a 'pure' higher level language and compiler in a manner suitable for writing an OS from scratch (no libraries beyond the standard, no integration with an existing OS). This would also include many various 'trivial' projects of the sort you might see when learning Haskell, Scheme, or C in school, where use of libraries is often forbidden because you're learning how to write your own linked lists.

this model relegates "individual programming" not to rarity but full extinction.

The problem isn't with the definition, but with the fact that "individual programming" really is near full extinction, and has been moving steadily closer to it. If it survives today, it's in embedded systems and sometimes due to bad lessons taught to beginning students in programming about their role as programmers (sometimes they come out of school feeling they really need to write their own linked lists...).

We should recognize this, call it a good thing, not elevate the status of the individual programmer based on illusion or egotistical desire.

Right now, despite the fact that individually developed programs are rare for anything more than school exercises, (1) programmers are still treated by IDEs as gatekeepers, masters of a domain for individual projects, (2) language projects aren't designed to integrate naturally, makefiles are problematic, name conflicts abound if just tossing code together due to the heavy use of hierarchy, code is often copied from project to project.

Our tools should treat the programmer and the code as they are: elements in an interactive and reactive society. This actually empowers 'individual' programmers by giving them more power to utilize, update, refactor, and document code written initially for projects other than their own.

There is no cost in terms of IP when code needs to be protected... even within a company, one could inherit code from an OS arena using distributed version control then separately add and manage private code, making it easy for different project teams working on different projects to share and integrate (and even support pushing some enhancements back to the OS arena).

Even without sharing one's own project, access to other projects, of recognition that code is a social effort that is mostly about integrating existing libraries, plus the language features necessary to make it practical (i.e. integration w/o makefiles and complicated build routines, ability to write libraries usable and optimized without boilerplate code, support for automatic and continuous unit testing, ability to be alerted of updates to certain pages, etc.) would offer considerable advantage.

You can do "individual

You can do "individual programming" by just thinking. Maybe you can use paper/pencil/whiteboard as the next step... ok maybe you use a word processor. Hmmm... why not an IDE then. .... but you're surely "running" your programs in your head, so you don't really need a computer, do you ....

It is definitely individual cognition. Nevertheless, it has been shaped by interactions with the society as well. That's why I think both points of view have to be considered. It is not possible to isolate them or say one is more important than the other. Each is impossible without the other.

People who do "individual

People who do "individual programming" are very, very rare. People who think they do "individual programming" are very, very common.

No way to verify such a statistic, but your statement surely implies the phenomenal success all programming languages to date have achieved w.r.t. collaboration :)

What it explains today is

What it explains today is DLL hell, the headaches we go through with frameworks and associated boilerplate code, serious challenges to share structured data between processes, deadlock concerns, the difficulty we experience when integrating shared libraries, etc. Because, with few exceptions, what we have today is minimal support for collaboration hacked in atop languages and IDEs and Operating Systems each designed with the idea of the individual programmer in mind.

Linguistics and Computation

If you've not already come across it then the work of Chris Barker may be quite interesting to you. He has published several papers on the links between natural languages and formal languages. In particular he gave a keynote at POPL this year called Wild Control operators that seems to touch on points (2) and (3). I can't really offer you much detail to describe it, as it is somewhat outside of my area, but if you treat the semantics of natural languages as implementations of cognitive models then his work would be right in the middle of the area that you are asking about. There are several posters on Lambda who know his work far better than I do who could maybe give an explanation...

Thanks. I didn't know about

Thanks. I didn't know about Chris Barker's work. I skimmed the wild control operators popl abstract and it seemed to be about probing the formalism of natural language. I was expecting the other direction - picking operators in natural language and bringing them over to formal languages ... which, however, begs the question, do end-user programming languages *have* to be formal?

do end-user programming

do end-user programming languages *have* to be formal?

Well, something needs to interpret them, right? But perhaps one could focus on 'best guesses' and 'heuristic interpretations' and varying interpretations based on runtime observations (e.g. "kick the dog" would need to interpret "the dog" and "kick" appropriately based on whether the avatar is sitting, standing, where a dog is, whether you're more likely referencing the one in a woman's arms or the one on the ground (even if the one in the woman's arms is closer, etc.)

I suspect an 'informal' language could be good for AI, describing scene-graphs or skits for 3D characters, scripted behaviors.

However, even Inform language has a formal interpretation. Still, it's worth looking up.

Thanks for the link to

Thanks for the link to Inform.

I understand what you mean. In fact, I'm finding it difficult to declare something as informal as long as it has a computer implementation :)

You hit a nerve on the "varying interpretations based on runtime observations" point. We (as people) do that all the time, yet very few programming languages (if any) exploit that. I like Drescher's schema building approach for that reason.

"do end-user programming languages *have* to be formal?"

The axiom of formality for langauges:

The degree to which a language is formal, is the degree to which the language and its semantics/results/side-effects/abstract-interpretation is:

  1. Well-defined/unambiguous
  2. Simplified (from a reductionist perspective).

Put differently, IF you want your language's semantics to be well-defined and simple (from a reductionist perspective), THEN you will want your language to have an interpretation that can be formally treated.

The difference between a programming langauge and a formal language:

Programming languages have intrinsic resource-usage "semantics" (i.e. memory and time resources, and possibly other kinds of resources), while formal language EITHER don't, OR they run on machines that are abstract or do not have physical limitations.

Isn't the definition of

Isn't the definition of formality something like - "whose specification does not depend on anything outside the system."?

What do you mean?

That definition doesn't even make sense to me.

I assume you know what symbol grounding is. Isn't it the role of all languages to ground some aspects of some external phenomena within a system of symbolic expressions which represent/denote those phenomena?

No langauge is useful unless it is grounded, or applied (i.e. it is interpreted w.r.t a mapping from expressions to external phenomena). Once we ground a language, then it can be useful.

From my experience, formality means (or implies?) that a language conforms, strictly, to a set of hard-and-fast rules. Of course, this definition is tentative.

In retrospect, I have a hunch I've misunderstood what you mean. If this is correct, what do you mean by "outside the system"?

formality ..

that a language conforms, strictly, to a set of hard-and-fast rules

That's in the spirit of what I wanted to say.

A formal system can be specified purely mechanically, without any mention of "use" or "connection to the real world" or anything of that kind ... with almost an attitude of "if you find a use for it, that's your problem". That should resonate with how some mathematicians approach number theory, for instance.

More formally (;-P) - a formal system is a set of postulates and a set of procedures or "rules" to derive "truths" from postulates and/or other "truths" of the formal system. You are forbidden from invoking any "rule" *outside* of the formal system in order to infer "truths".

When applied to languages, I understand non-formality to mean that I can write expressions in the language that depend on the interpreter (the thing that connects the language to the world) for truth or falsehood - i.e. the set of postulates and rules I'm willing to write down for the symbols of the language are intentionally insufficient to express the kind of "truths" I'm interested in.

On Integration

Program-level integration

  1. SOA
  2. Unix Pipes & Command line
  3. COM, COM+, and DCOM

These things all take the philosophy that the best way to make things reusable is by encapsulating their implementation behind a process wall, and by providing a standard system under which the processes can all be fully orchestrated.

Another option: I'm surprised nobody's caught on to "Documentation--by-Contract"! Have you ever noticed how there is a rigid structure to typical API documentations (at least, the way Microsoft documents their APIs) that could be formalized?! Documentation specifies the semantics of: valid parameters, return values, invocation side-effects, etc. I'm sure there's something to be gained from this idea. Formal specifications are rigid, less ambiguous, and best of all, much briefer (not to mention their amenability to automated formal analysis).

SOA, COM/DCOM, Unix Pipes

SOA, COM/DCOM, Unix Pipes and such only encapsulate part of the implementation. In particular, data representation, protocol, and contract cannot be encapsulated. To compose these systems easily requires a great deal of support for integrating at the communications boundaries, which often involves sharing source.

As designs go, the SOA and dataflow approaches seem pretty solid... they still need a few enhancements to support security, secrecy, disruption tolerance, failover redundancy, graceful degradation, level of detail, ad-hoc cross-service transactions for service mashups, etc. Support for automatic distribution would allow parts of a composed service to automatically float over to wherever they need to be to optimize latencies and bandwidth. Just a few piddling trifles. ^_^

For source-layer and cross-process optimization, there are additional issues. By transporting messages in shared memory, one can avoid two payments for data translation. Given shared source (or high enough level bytecode), one can potentially inline the necessary aspects of the intermediate processes and services... and automatically duplicate the relevant bits when automatically distributing a multi-part service configuration. Due to the possible optimizations, users in SOA and Unix pipe systems, and to a lesser degree COM/DCOM systems, programmers are reasonably torn between reproducing efforts for efficiency and doing the simple thing. It is better if this pressure is avoided. Wherever possible, instead of encapsulating source and using separate compilation, share the source and support whole-system optimizations. This can be achieved with first-class dataflow pipes or process-calculi/actors-model based service configurations but not (in general) Unix pipes or SOA. Compile-time safety for data representation (plus possibly protocol and, stretching a bit, contract) would be icing on the multi-service cake.

I'm surprised nobody's caught on to "Documentation--by-Contract"! Have you ever noticed how there is a rigid structure to typical API documentations (at least, the way Microsoft documents their APIs) that could be formalized?!

More than a few people have noticed. I'm a believer in the idea that "anything that goes in a comment should really be automatically verified". However, in practice, there are limits to what we can check or test.

In any case, I'd suggest having languages support a formal system of annotations and an extensible syntax to include said annotations. Then allow users to process the AST including these annotations using tools they build. The same system can be used to offer suggestions to the optimizer.

In any case, I'd suggest

In any case, I'd suggest having languages support a formal system of annotations and an extensible syntax to include said annotations. Then allow users to process the AST including these annotations using tools they build. The same system can be used to offer suggestions to the optimizer.

You might be interested in Microsoft's CodeContracts being released for .NET 4.0.

Thanks for the suggestion.

Thanks for the suggestion. Microsoft uses preconditions, postconditions, and invariants to aide with runtime debugging and manages partial static verification. Such an approach seems a good one for verifying certain classes of comments.

There needs to be much more study on this

Type and Category theory only takes you so far. You need to "skin" it. The future is tools and interactive systems.

There's a gal at Sun that has talked about this (sorry can't find her writings).

Parfait?

Would that be Cristina Cifuentes by any chance? It sounds like a quote from a talk on Parfait.

Possible research approach

I'm a psychology student at the University of Washington and the way in which programming languages influence the problem space in the mind of the programmer is one of my main interest areas (I'm also a professional programmer).

I recently completed a paper exploring the idea a bit that folks may find interesting (or not :P). If nothing else, I believe it demonstrates a possible approach to the study of this matter on the scale of the individual (doesn't tie to the social issues people have surfaced here).

It's at the undergrad level so there was no lit. review etc. I had to hash it out from conception to completion in less than a month, but it's a start. Semantic Organization of Programming Language Concepts

To add my two cents to the individual versus social debate, a social context is in fact a collection of individual minds so in my opinion we would need to start there. However, there are emergent properties that arise from the interaction of the individuals, which are aspects not present at the individual scale. Those emergent properties need to be identified and addressed as well. Perhaps these are two reasonably separable topics.

Not so sure...

Setting aside all the usual critiques one might make (insufficient sample size, etc.) about your study, I want to zero in on the central premise: that the clustering of words elicited correlates to the semantic model of the domain.

Rather than assume this, would it not be simpler to assume that different styles of programming have different communities with different associated jargons?

For example, the jargon use of the word "prime" in your paper is idiosyncratic to the literature of psychology. I would predict that someone who was a psychologist and a mathematician who was "primed" with either psychology or math tasks would show different associations with the word "prime". I wouldn't expect this to tell me anything about how they actually think about either discipline.

The concern regarding the

The concern regarding the assumption that the clustering of terms is related to how one would actually apply those concepts in a problem solving situation is certainly warranted. The key inference made with models like this is that related terms mentioned in response to a cue term are likely to be the ones the participants would also consider when working an actual problem, reasoning, drawing analogies, etc. This hits the classic problem of lab vs. nature and whether or not the findings are still valid outside the lab. However, there's a fair amount of literature on these kinds models with respect to novice vs expert comparisons and cognitive changes due to disease, etc.

Setting aside all the usual

Setting aside all the usual critiques one might make (insufficient sample size, etc.) about your study, I want to zero in on the central premise: that the clustering of words elicited correlates to the semantic model of the domain.

[Whoops, should've read the article first..]

Got it right though: this is an assumption, mostly proven, which is used by most linguists.

May I humbly suggest....

this is an assumption, mostly proven, which is used by most linguists.

I can only humbly suggest that either you are over-simplifying or that you lack any familiarity with the field of linguistics as a whole.

This technique is widely accepted as a basis for psychology and pycholinguistic (a specific sub-discipline of linguistics) experiments. However, to say that it is "mostly proven" is very arguable. I think it tends more to be a case of lack of better techniques for dealing with a tricky area.

You do understand

... that I am laughing my head of, right now?

Given what I know, his definition of a domain/semantic model is as good as any; and surely good enough for investigation. Do you believe the point you raised is that relevant?

But, ok, if you are the expert, feel free to discuss. I liked what he did.

Love the Semantic Domain Models

You should really ask someone like Oleg to draw a domain model ;-).

Interesting

This site needs greater participation from people interested in more than programming languages themselves. I've been trying to get one of my best friends, a geologist and an avid R programmer, to consider participating occassionally, as I think a lot of people here would be interested to hear what he has to say. :-)

I'm a bit skeptical of your methodology as well, but I think Marc's carping is somewhat unfair. My bigger complaint would be "just three programmers?!"

Here is an anecdotal, though possibly useful insight into functional programmer psychology: I have to admit my first impression of pkhuong's code was the exact same as Qrczak's and Winheim's. And Winheim even had the benefit of seeing somebody else make the same mistake... so I suspect the code tricked a lot of functional programmers reading it, not just us three.

do-notation?

I think that the monad do-notation might be a useful case study. I have an intuition to match a for-loop (in C, say) or recursion or pattern-matching or any number of constructs. But I am struggling to get one to handle do-notation. The problem is that each monad reduces to its own very special form. Consider this expression, which is close to something I came across on the web:

do x := t
u
return (g x) -- (1)

Define the ST type:

type ST a = a -> (a,Int)

Define return and bind for that type. Now, take (1), desugarise, replace bind and return, reduce and I get:

\r -> (g (value (t r)), state (u (state (t r)))) --(2)

where state and value extract the state from a pair and the value from a pair by state = snd and value = fst.

Do the same process in the list monad, starting with a modified version because I prefer to call lists l and m:

do x := l
m
return (g x) -- (1)

and I get this:

concat ( fmap (\x -> (concat ( fmap (\_ -> [g x]) m ) )) l) --(3)

OK, so I see that there is an analogy here between 'list shape' and 'state transformer', and in each case the thing in question is being used to apply a basic function, g. There are still differences: the state transformer doesn't treat state separately from the value, the state interacts with generating the value. In the list shape, the shape introduced by m doesn't have the same interaction with the value. I suppose that that sort of interaction might be seen if the shape monad related to tensors,where there would be interaction between off-diagonal elements.

But back to the notation: wherever I see an expression in do-notation or bind-notation, I can only, so far, understand it if I have first taken an example such as (1) and converted it into the equivalent of (2) or (3). This feels similar to interpreting obfuscated code, where the initial statement of the code cannot be understood until some transformations have been applied.

do-notation and bind-notation emphasise sequence but there is much more to a monad that that. Monads, ignoring the IO monad as an exceptional case, are no less functional than any other part of haskell. In evidence of that, let me just observe that wherever there is a monad, there is also some function of the form

extract :: Monad m => m a -> a

I would like the notation to show me the underlying idea of a monad in general, so that I can see what I am doing with (1) for _any_ monad. As I remarked above, the required intuitions for many other constructs are well-understood. I think it is a worthwhile aim to get such an intuition for monads.

I would like the notation to

I would like the notation to show me the underlying idea of a monad in general, so that I can see what I am doing with (1) for _any_ monad. As I remarked above, the required intuitions for many other constructs are well-understood. I think it is a worthwhile aim to get such an intuition for monads.

Part of the problem here is that monads abstract over a class of exactly such intuitions. So in some sense you'll never get "an intuition" for monads. That's the whole point! However, you can develop an intuition for exactly which intuitions are monadic. ;)

In evidence of that, let me just observe that wherever there is a monad, there is also some function of the form

extract :: Monad m => m a -> a

Incidentally, this is not true.

counter-example

Incidentally, this is not true.

For a counter-example, consider the list monad mentioned in the parent. Also, the 'ST a' presented should I assume be Int -> (a, Int).

Incidentally, this is not

Incidentally, this is not true.

Yeah, you need a generalisation to account for things like error monads where you genuinely might not have an answer. Not to mention IO-wrapper monads and the like. You can kinda cover the IO-wrapper case by saying the IO monad is the 'host' language rather than Haskell, though.

Matt is correct...

... on both counts. Matt and Phillipa also make good points here.

Saying something is a monad really isn't saying much at all, so you usually can't get much of an intuition for what a bit of code does just by virtue of the fact it has a monadic interface.

As for how to understand monads, it takes work, and time. My suggestion is simply to get familiar with *lots* of different examples of monads, preferably some advanced examples as well. I've been considering trying my hand at an advanced monad tutorial, that includes some of the most interesting (and in-depth) examples of monadic abstractions I'm aware of.

Monads are useful as a standard kind of construct, that enables you to express a surprising variety of things. And since they can express almost anything, it's hardly surprising that they offer very little insight into anything in particular.

The notation really does

The notation really does show you what's going on in general, though in a rather abstract way. Bind corresponds to a single-variable ML-style let, and do notation to a sequence of them ending in a result. What you don't have is a fixed evaluation order for it. In fact you don't even have guaranteed determinism for it - the list monad effectively gives this little language a non-deterministic interpretation.

To simplify: monads are about binding. This includes binding and using other computations.

Comprehending monads

See Wadler's comprehending monads for an alternative notation over monads based on list comprehensions so your examples would look something like

[g x | x <- t, u]

In Scala your examples look like

for {
  x <- t
  _ <- u
} yield g(x)

And in C#'s LINQ

from x in t
from temp in u
select g(x)

Honestly I think these notations do a better job than "do" for guiding intuition on some kinds of monad but do significantly worse for most, especially for those that aren't particularly collection like.

Thanks to everyone for these

Thanks to everyone for these points. I hope to get a firmer grip on monads over time.

I did raise the idea more on the basis that a cognitive model of 'do' might be a useful thing to have, rather than seeking advice on my own deficiencies, useful though that has been. Some of the responses suggest that 'do' might not be a good candidate for the question of such a model. Anyhow, thanks for the discussion.