What are The Atoms of Programming Languages?

Atoms are true elements that can not be divided any further into other elements. There exist a finite set of atoms relatively small as compared to a set of all possible physical substances that can be constructed from these atoms.
Can we define a finite set of programming atoms that all existing programming languages can be built from?
Is it possible to build a Periodic Table of Programming Elements similar to the one discovered by Dmitri Mendeleev for physical world?
Will such Periodic Table of Programming Elements allow to envision properties of new programming languages not existing yet?
Chomsky and other linguists have already discovered atoms of natural language. (See for example "The Atoms Of Language, The Mind's Hidden Rules Of Grammar" by Mark C. Baker. Excellent work!)
Using these linguistic atoms any of several thousand human languages existing today or used in the past can be constructed. To say more new natural languages can be fully built from these linguistic atoms.
Atoms of natural language turned to be parameters used in recipes for building all other language constructs.
The following recipe reflects one of these parameters/atoms:

"Heads precede phrases in forming larger phrases in English-type languages".

"Heads follow phrases in forming larger phrases in Japanese-type languages".

Parameter used in these recipes is called "Head Directionality Parameter". This parameter or natural language (atom) defines how more complex phrases are built from simple ones by adding new "heads" to the phrase.

For example, for both language types we can build a complex noun phrase:
"all existing programming languages"

from a simple phrase:
"languages".

Steps to build complex noun phrase in English-type languages:

1) languages
2) programming languages
3) existing programming languages
4) all existing programming languages

Steps to build complex noun phrase in Japanese-type languages:

1) languages
2) languages programming
3) languages programming existing
4) languages programming existing all

There are other atoms / recipes that can be used for natural languages existing today, used in the past or just envisioned and yet to be implemented in future.

But what about programming languages? What are their atoms? Being just another breed of languages programming ones may also have parameters in a role of atoms as their natural brothers and sisters do.
So what finite set of parameters can be used to construct from the same building blocks C++, J, Java, Lisp, ML, Oz, Erlang, Basic, Fortran and Perl ? And what are the building blocks?

For starters I propose these building blocks that any programming language can be built from:
Object, Variable, Program Scope, Closure, Lambda / Function, Template

As for Parameters (atoms) I would start with Assignment, Binding, ... what else?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Whoa Nelly!

Chomsky and other linguists have already discovered atoms of natural language.

I won't cast aspersions on Mark Baker's book, not having read it (though I'm familiar with many of its arguments). However I feel compelled to point out that the above is a very strong claim that has not even remotely been proven to any satisfaction, unless you've already drunk the kool-aid.

As for Parameters (atoms) I would start with Assignment, Binding, ... what else?

As I responded in another thread, I think CTM has a pretty good inventory that should do the trick for most things.

OT

People interested in this sort of thing, might enjoy reading Pinker's reply to Fodor, So How Does the Mind Work?

Answer from other thread...

Yet, talking about programming language atoms I was thinking not about language features but about basic elements that language is built from

I'm not sure what distinction you are drawing here: PLs are much more semantically and syntactically constrained than HLs in most respects, and completely free of HL constraints in others.

As an example of the latter, you can choose a different basic syntactic order for every type of expression if you don't mind being inconsistent. (Think of Perl as an example ;-) )

I think what you call "features" is really the only significantly interesting taxonomy you could exhaustively list.

Term “feature” dose not m

Term “feature” dose not mean the same thing as “atom” does. From bricks, for example, one can build both a 3-level house and a garden wall. Features of a house differ significantly from those of stand-alone wall.
Speaking about PL you can talk about features as CTM does, such as concurrency, constraints, first-order logic, lazy evaluation, pure functional data structures, etc. And those are PL *features*.

PL Atoms are yet to be discovered, I think.

Respectful Disagreement

Dmitri Kondratiev: PL Atoms are yet to be discovered, I think.

I guess this depends upon how you define your terms, but in what sense is the SK Combinator Calculus not "the atoms of programming languages?"

Oh, wait: perhaps it's this instead!

I agree, one or another varia

I agree, one or another variation of lambda calculs can be used to define PL atoms, I think. But what are the elementary "parameters" that can be tweaked so Lisp, Cobol, Perl and Haskell all can be built from the same 'raw' PL material ? And what is that 'raw' language material?

Is it possible to build all distinctive features / computational models of any PL from lambda calculs only?

It looks like 'raw' PL material depends on parameters involved in PL building and vice versa. In other words elements and parameters are interdependent and don't exist one without another ...

Nominal candidate for atomicity

Is it possible to build all distinctive features / computational models of any PL from lambda calculus only?

All semantic features, sure. In fact, traditional denotational semantics uses a variety of lambda calculus as the meta-language in which to define the semantic features of programming languages.

But the same kind of mapping would be possible for any Turing-complete computational system. If you're looking for universal "atoms" which apply across all computational systems, then whatever you find will have to constitute a common denominator between lambda calculus, SK combinators, Turing machines, and an endless variety of other possible computational systems.

The most interesting candidate I can think of at that level, are names. Naming is the most fundamental form of abstraction. Although there are systems and languages which avoid names - e.g. combinator calculi such as SK have a very small, finite number of names - the expressive power and usability of a language is strongly correlated with its ability to name things.

An examination of lambda calculus from the perspective of what it does with names shows that it's all about managing names. The abstract syntax of lambda calculus has just three rules: abstraction, application, and variable reference. These correspond to defining a name with a clearly-defined scope, binding a name to a value, and accessing the value bound to a name.

This is about as atomic as it gets in programming languages. Practically speaking, you have to have names to do anything useful, and having names implies having the ability to define them and their scope, bind them to values, and access those values. That seems obvious.

The really interesting part is that given only names, and a small but sufficiently general set of operations on names, we have a complete computational system, and we don't really need anything else. If John Lennon had been a PL guru, he might have sung "All you need is names..."

It should be noted that the above doesn't take syntax into account at all. Syntax also involves names, but that's not all there is to it: it's about structure, the relationship between tokens in the source of a program, where the tokens are either names, or syntactic sugar. In the atomic analogy, syntax would be some aspect of the bonds between atoms, although describing PL syntax as "electromagnetic force" might be pushing the analogy a bit far!

Names are good candidate

This is about as atomic as it gets in programming languages. Practically speaking, you have to have names to do anything useful, and having names implies having the ability to define them and their scope, bind them to values, and access those values. That seems obvious.

I agree. I have been developing an environment for running small special-purpose PL's and doing some dynamic metaprogramming over them. Thus, I've been playing around with general features or "grand universal atoms" of different PL's for some time. My approach has been quite pragmatic though.

Names, bindings and scopes are quite universal. Values are not, besides that they exist and can have bindings to. But relations between values, such as equality and subtype, can be used. So, I can say "a = b", but I can't say "a < 50.0".

Another set of general features are points of execution . In declarative languages they are not visible, but exist in any case. And must often be twiddled with. I decided to even organise them in a stack structure, but I'm not yet fully convinced how general it will be.

I have to mind you once again that my approach is quite pragmatic. I'm trying to find general features that cover enough languages with enough expressive power.

Points of execution

Another set of general features are points of execution . In declarative languages they are not visible, but exist in any case.
That's a very good point. There should be some universal PL parameters describing execution / concurency model that PL has a notion for. How you define a "point of execution" ?

Objects and Names

The really interesting part is that given only names, and a small but sufficiently general set of operations on names, we have a complete computational system, and we don't really need anything else.

Agree, Names are good candidates for PL atoms. But do we imply objects when we talk about Names?
IMO Names exist as long as Objects exist. That BTW means that one and the same object cay have several names or aliases. It seems that names without objects are meaningless. Of course you can have operations that work only on Names without affecting any objects. But what this gives for PL?

"names without objects are meaningless"

It seems that names without objects are meaningless.

What about lambda or pi calculi then?

Objects

In the lambda calculus the objects are the lambda abstractions.

Re: Objects and Names

But do we imply objects when we talk about Names?

If names are allowed to name anything other than another single name, then objects (of some sort) are implied.

In lambda calculus, by defining a name, along with a scope for that name, the resulting package consisting of name, scope, and contents of that scope becomes an object (a lambda abstraction, or function).

Of course (and relating somewhat to Andris' point about published names), names also commonly refer to objects outside of the system in which they are used. We use names which have some meaning — refer to some objects — in the program's problem domain. This choice of names has nothing to do with the computational functioning of the program, but it provides us with a way to keep track of the external meaning of programs, as opposed to its internal mathematical semantics.

What exactly is hiding behind Names?

by defining a name, along with a scope for that name, the resulting package consisting of name, scope, and contents of that scope becomes an object (a lambda abstraction, or function).

I understand that. The question is that speaking about PL atoms, should such things as Object and its Type be defined explicitly or implicit definition through names will do?
And does a notion of Type needs a definition at all? If it does how we define it with Names only?
Otherwise what exactly Name denotes? One or many things at once? If many - we have ambiguity.
Names, Objects, Types and Scope could all be "first-class" PL atoms.
Otherwise we need Recipes how use Names to define objects / types and scope also.

Names for computers or humans?

What exactly is hiding behind Names?
I guess it depends on whether your question is epistemological, psychological, or CScientific.

I am affraid that if it's one of the former two, then we risk aggravating the shadow, which will come down in fury and accuse us all of mental masturbation and what not. If you're not affraid, chant "names look like a cognitive factor" :-)

But if you meant the CS meaning, then I think you should not use word "objects", as it has a pretty standard meaning, which is quite heavy to be an atom. Oh, is it transuranium?...

Type, scope and points of execution ?

Leaving aside objects for a moment, any ideas how to represent a notion of a type, scope and "points of execution" that akallio writes about ?

I agree that "object" is certainly an overloaded CScientific term and may look too heavy to be an atom. Yet
The Scheme Programming Language, Third Edition by R. Kent Dybvig

tells this (among other thing things) about an object:
[quote]
Although the particular rules for object identity vary somewhat from system to system, the following rules always hold.

* Two objects of different types (booleans, the empty list, pairs, numbers, characters, strings, vectors, symbols, and procedures) are distinct.

* Two objects of the same type with different contents or values are distinct.
... etc.
[/quote]

To avoid ambiguity a new term for objects in the above sense could be coined, ... entity ?

Objects, types, scopes, and points of execution

Objects

In purely lambda-based systems, "objects" are implemented in terms of lambda: booleans, numbers etc. are all represented as functions. For a practical programming language, that's not considered efficient enough, so it makes sense to implement certain basic types of value as primitive. If you're considering a language in the abstract, it's possible to continue to consider such values as being functions, if it suits you, i.e. they don't have to be considered atomic. If you're modelling real implementations, you probably want to consider the basic data types to be atoms.

Types

In a dynamically-typed system, types are represented by tags associated with objects, and you don't need any new "atoms" to deal with them. In a statically (syntactically) typed system, the type scheme is a meta-program which operates on the program's terms. This meta-program can be represented using lambda abstractions, the input to which are the terms of the source program. Such meta-programs can support partial evaluation, i.e. be run before normal runtime, since much of their input (the program source terms) is already available. This achieves static type checking. No new atoms are needed.

Scopes

As I wrote previously, using names at all implies abstraction, scopes, binding, and dereference. I'll leave it to you to characterize these as atoms, fermions, or whatever. ;)

Points of execution

These sound like continuations.

Steele and Sussman's Lambda Papers cover all of these issues quite well, from the point of view of lambda as an "atom". Although lambda certainly isn't the only possible atom, it provides a good case study of composing a language from a very small set of abstractions, and there are real languages, like Scheme and the functional languages, which follow this model.

E- and I- languages

First of all thanks for a comprehesive definition of objects, types, scopes, and points of execution from a point of view of lambda-based systems!

In language atoms quest it is also interesting to look at other similarities / differences between HL and PL.

In "The Atoms of Language" Baker defines two generic aspects of every HL: E-Language and I-language.
E-Language - Extensional language - is a language specification through concrete examples of its sentences. E-language includes all peculiarities of its grammar and usage. That's why, for example English and Japanese E-languages have so little in common. Complete E-language is defined by providing *all* possible sentences valid in it. Though E-language can never be fully defined everybody is successfully using it every day.

I-language - Intensional Language - is the one that is internal to the mind of speaker. "I-language" is a collection of recipes that tell how to make and understand any sentence conceivable in this language. Giving complete definition of a HL as I-language (English for example) is impractical, because no complete English grammar exist. Though there is no complete list of grammatical rules for English, I-language allows you at least conceive every one possible correct sentence of the language.

Some linguist argue that all HL have one and the same I-language though drastically different E-languages.
For example English and Japanese both:
- have nouns, verbs, adjectives, pre/post-positions, auxiliaries etc.
- both build sentences in piecemeal fashion
- have the same rule for combining small phrases in bigger phrases
- etc.

So, it looks like all of the above are the features of I-language which is one and the same for different E-languages (English, Japanese).
The only difference is a side to which a new word is added to create a bigger phrase. And this difference is one of HL parameters that linguists are looking for. There are other parameters but this one is good for illustration purposes.

Now one can argue that any PL also has E-language and I-language and that all PL languages have one and the same I-language. From this also follows that PL atoms are I-language atoms.
As a real world example consider the task when programmer who works all his life with "Basic" needs to explain her code to Haskell programmer and vice vera. Without common I-language that could be hard, I think.
In this light I am not sure that programmer using Basic will understand definitions of type, object, scope, etc. through lambda calculi.
So question here: Does it mean that in principal different PLs do not have common denominator? Which further means that the idea of E-Language and I-Language does not make sense for PLs?
Or we are looking for PL atoms in a wrong place?

Common denominators

In this light I am not sure that programmer using Basic will understand definitions of type, object, scope, etc. through lambda calculi. So question here: Does it mean that in principal different PLs do not have common denominator?

There are plenty of common denominators. Names bound to values throughout the scope of a function is one of the most fundamental ones, which is shared by Basic and most other PLs.

Lambda calculus simply offers a very general theory of such constructs. The fact that Basic is not so general doesn't mean that it shares no common denominators, it merely means that it only represents a restricted subset of the possibilities. A Basic programmer may indeed have some difficulty coming to grips with the full generality of the lambda calculus, but shouldn't have much difficulty understanding those aspects of it which Basic does implement.

However, there may well be aspects of an "atomic" explanation that are non-obvious. For example, programmers in most lanugages are unlikely, on their own, to conceive of local variables as being parameters to an invisible function that is applied when the variable is initialized, even though that's how lambda-based languages conceive of them and even implement them. Despite any unintuitiveness, this explanation is very atomic — it avoids inventing a new construct to explain something when an existing construct can easily do the job, even if the results are sometimes surprising.

If what you want instead are concepts which average language users would already be familiar with, then you're not looking for the theoretical atoms of PLs, in the sense of small, useful, universal building blocks that can be used to construct any language; instead, you're looking for commonality in people's understanding of PLs.

To find such commonality, you have to develop a theory and a vocabulary like the one that linguists use to describe languages: nouns, verbs, grammar, and more complex properties of natural languages are not something that naive language speakers are particularly aware of on their own — they only learn them, directly or indirectly, from people who've studied language. Similarly, any such theory and vocabulary for PLs has to come from people who've studied PLs. Not all such theory will be immediately recognizable to average PL users. That is, the theory used to describe people's use and understanding of language may not be immediately recognizable to a user of the language, even if all it does is describes features of the language that they understand intuitively.

To put this another way, dogs don't understand calculus, but they do understand how to catch a ball following a parabolic arc.

Where the human language comparison falls down is that major human languages like English and Japanese are assumed to be roughly equal in expressive power and other measures of completeness. This is not the case for many programming languages. Languages like Basic could be compared to Riau Indonesian, which allegedly has no nouns or verbs[*]. If Basic invalidates the common denominator status of lambda calculus, then Riau invalidates the common denominator status of linguistic theories involving nouns and verbs.

[*] Long live Sapir-Whorf!

Re: Common denominators

I agree that, as you write, aspects of an "atomic" explanation are non-obvious and that the theoretical atoms of PLs, in the sense of small, useful, universal building blocks may differ from the theory descibing PL languages based on the concepts which average language users would already be familiar with.
I am looking first of all for theoretical atoms of PLs in the light of I-language that I wrote about previously.
I-language is kind of a collection of recipes that tell how to make and understand any sentence conceivable in this language. From this point of view a common denominator (as you put it) Names bound to values throughout the scope of a function looks promising for PL atom "fishing".
I would prefer talking about "functions" instead of "lambdas" just because of these poor souls of Basic users :)

Function vs. lambda

From this point of view a common denominator (as you put it) Names bound to values throughout the scope of a function looks promising for PL atom "fishing".
I would prefer talking about "functions" instead of "lambdas" just because of these poor souls of Basic users :)

I used the term "function" deliberately in the bolded quote above — it's a reasonable term as a lowest common denominator, and users of most programming languages could easily understand how that statement applies to their language.

However, used in the above sense, "function" only communicates the existence of the most basic features of functions that a language supports, e.g. they accept parameters, they can be invoked by name, and they may return zero or more values to their caller.

Since "lambda" derives from a general theory of "real" mathematical functions, it is much more well-defined, and means the same thing (as much as is ever possible) across most programming languages which provide lambda features. Even the differences that occur between such systems are quite well-defined, e.g. call-by-name vs. call-by-value evaluation.

For example, I can say that Perl contains the lambda calculus, or similarly, that Javascript functions are lambdas. This implies a number of specific details about those languages, such as that function definitions can be nested, that functions are first-class values, and that nested lexical scoping is supported. If a Basic programmer were trying to understand how Perl and Javascript functions differ from Basic's, he might find it useful to know what "lambda" means.

re: Function vs. lambda

"function" only communicates the existence of the most basic features of functions that a language supports, e.g. they accept parameters, they can be invoked by name, and they may return zero or more values to their caller.

And you are right: Lambda is more atomic and also more generic then Basic notion of a function, which is a special case of lambda.
So it looks like if Basic user decides to learn about PL atoms she will *have to learn* about lambdas as first class function objects, environments, variable binding and closures. And BTW the same is true about any person who wants to get an understanding of physical substances - she will have to learn all the concepts behind the Periodic Table of Elements.
...

Types, names, and scopes = telescopes

Interesting interaction between types, names, and scopes occurs in dependent type systems.

For example (from RECENT RESULTS IN TYPE THEORY AND THEIR RELATIONSHIP TO AUTOMATH):

A telescope is essentially a context of cascading type declarations such as

x1 : A1; x2 : A2(x1); ... ; xn : An(x1; ... ; xn-1)

xes are names (variables), As are types, each Ai is in scope of all xes with lower indices.

Static internal names vs. dynamic published names

Although there are systems and languages which avoid names - e.g. combinator calculi such as SK have a very small, finite number of names - the expressive power and usability of a language is strongly correlated with its ability to name things.
I agree that while Turing completeness does not require names, usability most often does.

OTOH, quite often in practice one wants to differentiate between internal names and "published" ones. Published names usually have explicit type info, are not alpha convertible, and generally are human-readable. Internal names can have inferred types (or none), are alpha convertible, and can be not readable (de Bruijn indices). This probably stems from their position in the program - internal names are statically bound in bulk of the program, while published ones lie on a dynamic binding frontier (and by dynamic I mean any time later than that of static, not neccessarily runtime - of course, this raises a question of multiple stages of names. Hmm, telescopes?...).

The bottom line I guess is that for pragmatic (uh, applied?) theories it makes sense to have both kinds of names, and have them different.

Both kinds of names

The bottom line I guess is that for pragmatic (uh, applied?) theories it makes sense to have both kinds of names, and have them different.

Ah, I knew I didn't come up with this myself - I had stolen it from James McKinna's paper.

Taking an analogy too far

There exist a finite set of atoms relatively small as compared to a set of all possible physical substances that can be constructed from these atoms.

...

But what about programming languages? What are their atoms?

I think you're jumping the gun here. First, let me say I agree with Marc that we have by no means found "the atoms of natural language," and our understanding of the brain and intelligence in general is quite lacking on a number of fronts.
But second let me point out that there may not be a set of "atoms" as you propose, because programming languages are, as far as I know, not equivalent to physical substances. (furthermore, we now know that our atoms are not actually atomic, but that's moot)
Having said that, I think it is important to search for more languages general enough to represent others; but said languages would likely consist of more than simple calculi because programming languages are more that what they can compute. The sk combinator calculus and others have no methods of dealing with the different syntaxes of languages, not to mention things such as concurrency and nondeterminism.
Not to say that there aren't any "atoms of programming languages", but let's not get too gung ho about it just yet.

A Methodological Proposal

Let me propose a methodology by which Dmitri (or anyone else) could sort out a meaningful theory of PL "atoms":

- select the range of phenomena to be explained. Give each phenomenon a formal definition.

- propose a set of atoms, each formally defined. Show how these atoms can be composed to explain the phenomena in question.

- prove that your atoms are independent, i.e. that they can't be derived from each other

Then we would all be talking about the same things, and we'd be able to evaluate individual proposals.

A reasonable idea, but...

I think we are now at a stage of exchanging some preliminary thoughts which is neccesary before diving into the method you propose.

By all means this discussion can be taken off-line (and continue by e-mail between interesting parties only) if for some reasons it is considered not productive here.
Just let me know ...