Abstraction Tiers of Notations

DZone has published 3 my articles:

Few key points from them:

  • Abstractions are not created equal and they could be sorted by tiers depending on structural elements they use and the way these elements are used.
  • The tiers are closely tied to cognitive development stages as described by J. Piaget and M.L. Commons.
  • Each new tier of the abstractions is built upon constructs of the previous tier.
  • Adoption of new tier causes major generation of the general-purpose programming language to appear.
  • Basing on the model, the next generation of general-purpose programming languages will be likely System-Oriented Programming. That will use ‘system’ as a new category of type that could be instantiated. Current dependency injection frameworks are early and incomplete attempts to implement this new abstraction.
  • Basing on the model, the evolution of the database languages should be parallel to the evolution of general-purpose programming languages, with the same abstraction tiers adopted.

Note, I’ve described some elements of this model in some discussions on LtU previously.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Labeled bigraphs, SUO-KIF, do they qualify ?

At first reading i could not even imagine what a system type would look like. But, after a second thinking i remembered that J. Sowa acknowledged that conceptual graphs could be based upon bigraphs. Bigraphs are nestable components connected via interfaces/contracts. So the first candidate i could find is labeled bigraphs. And admittedly i can not name anything else that qualifies. If you are disturbed by diagrams and strongly prefer text then may be you could also consider Knowledge Representation Languages like SUO-KIF.

Conceptual Graphs and BiGraphs

I've actually worked with conceptual graphs/KIF before in the context of BSc. The conceptual graphs are alternative notation to first-order logic. So, it is a 4th tier notation.

BiGraphs looks more promising. Interface/contract is clearly the tier 4 concept (black boxes). Hierarchy of contexts is the tier 3 concept. However, there seems to be some logic over links as well and local-to-graph labels, so it might be the tier 5 concept after all. I'm still looking through it.

The system concept introduces an induction principle over components, links, and environment in the system. For example, spring context startup and shutdown operations are results of application of this induction principle, they are projection of the graph structure. In most IDEs or other plugin-based applications, window menu is a reflection of the components specified in the plugin descriptors (system definitions). So, system provides multiple filtered mappings/inductions from labeled graph of components. And labels are system-local rather than component-local, the separation between environment and component is one of key points for the systems.

Generally, the concepts of the higher tier should be compliable to the lower tier concepts with formal proof of equivalence, however the reverse process should be more heuristic rather than formal, like guessing C program structure from assembly code. So, BiGraphs need to be further examined if they are actually the tier 5 structure or it is something that could be a target for compilation from the tier 2 structure. Like Java class or lambdas could be compiled to function pointers and structures, but they are separate entities that built up simpler structures.

removed

it was misplaced

Black Boxes and Hierarchies

Are Black Boxes a tier above heirachies? I don't think Black Boxes depend on hierarchies in any way. You can have a language with type-classes and datatypes. Datatypes don't seem to qualify even as objects in your classification as they don't have methods.

Stepanov's definition of an object is a block of data with an address and an extent, this seems a more general classification, and one up in abstraction level from basic CPU hardware types (int/float/char etc). We could call this a 'record' where the fields are hardware types to distinguish from your definition of object.

I am not sure about patterns either. You can have a pattern of black boxes, so to me patterns may well be a semantic level of abstraction above objects.

So we can have a language of type-classes and records that uses only tier 4 and tier 0.5 (inserting records between chaos and objects).

I would go with something more like this:

0 chaos
1 records
2 type-classes

OR

0 chaos
1 objects
2 concepts

Concepts (see Stepanov's "Elements of Programming") are 'types' for templates, and are sematics very similar to type-classes.

Type-classes and Concepts remove the need for informal "design patterns" as these structures can be represented in the language itself.

Hierarchies don't really seem to fit into this abstraction framework as they are not necessary for type-classes/concepts, and would be subsumed by then (they are just an example of one "pattern" of object arrangement).

Re: Black Boxes and Hierarchies

The term objects is overloaded in computer science. In the name of the tiers it is ontological object rather than object-oriented object. Possibly better name for the tier would be "single opaque entities" or something like this. This is quite close to Stepanov's definition.

The record that references primitive types is pure pattern concept (tier 2). The record that directly references other record in addition is the hierarchy concept (tier 3). This is a key difference. This is why SQL databases are the tier 2 (other records are referenced only indirectly via matching keys) and Graph and OO databases are the tier 3 (other records are referenced directly).

Black block concept requires to have records that reference something by contract. But in the first place they require a reference, which is tier 3 concept. The difference is that we reference not the concrete type like in the tier 3 types, but interface type (existential type). Another example are generics, where we could write a collection type that interprets parameter type as black box with the fixed contract.

To sum it up, we could see the following type constructors:

  1. Objects = object of primitive type
  2. Patterns = flat map from names (or integers in case of arrays, which are also a kind of names) to primitive types (FORTRAN 66)
  3. Hierarchies = map from names to other records of concrete type (C, Pascal)
  4. Black boxes = map from names to contacts and statements that record implements a contract (C++, Java, C#)

Note: generally, most generic structures can emulate more simple structures, but on other hand, more general structures are compiled to simpler structures to actually execute. This is a false equivalence, as both mappings are partial and its cyclic application gernally does not have a fixed point. When we take most generics concepts as foundation we would miss actual peformance characteristics of the lower tier entities.

Hierarchies

So you don't mean OO style inheritance hierarchies? Perhaps this needs to be made clearer as I am sure it is the first thing I think of regarding hierarchies.

Having read your comment above it seems more like you are taking about trees and other recursive data structures. Things you would write in Haskell like:

data Tree a = Branch (Tree a) | Leaf a

However although the notation is denser (and the reference is not explicitly mentioned) is this really a more powerful abstraction than:

struct Branch {
   union Tree left;
   union Tree right;
}
struct Leaf {
   int value;
}
union Tree {
   struct Branch branch;
   struct Leaf leaf;
}

Actually I think it is, but not because of the hiding of the pointer, but because you can abstract the type parameter of the leaf value in Haskell. So for me parametric types are what increases the abstractive power of Haskell, and this can be realised in C++ using templates:

template <typename T> struct Branch {
   union Tree<T> left;
   union Tree<T> right;
}
template <typename T> struct Leaf {
   T value;
}
template <typename T> union Tree {
   struct Branch<T> branch;
   struct Leaf<T> leaf;
}

Perhaps because you cannot have objects without an address (and extent) the "hierarchies" (if I am now understanding you correctly) are overlooked because "pointer" is a fundamental primitive type in almost all computer systems. Perhaps you can be imagine a system without references/pointers, and this might be interesting mathematically, but from a pragmatic point of view is not something you are ever going to encounter?

If I have understood correctly, then we have primitive types on tier one, simple records and arrays on tier two that cannot reference other records (and require joining as in relational algebra), we have constructions of objects (things like trees and other multi object data structures) on tier three, and abstract data types like modules, or type-classes (API contracts) on tier four.

How would parametric types and templates fit into this hierarchy, as they clearly advance abstractive power?

Pointers are quite high level

Perhaps because you cannot have objects without an address (and extent) the "hierarchies" (if I am now understanding you correctly) are overlooked because "pointer" is a fundamental primitive type in almost all computer systems. Perhaps you can be imagine a system without references/pointers, and this might be interesting mathematically, but from a pragmatic point of view is not something you are ever going to encounter?

The high-level languages without pointers do exist, but they mostly obsolete. The possibly purest such language is classical BASIC, which is a canonical example of the tier 2 language as it was intended for teaching. This language did not support hierarchies at all. There is no block code structure, there are no pointers, there are no data structures (only arrays of primitives and global variables). FORTRAN 66 did no had pointers as well as far as I remember. Assembler languages also do not have pointers as such, they have integers and indirect access operators. The pointer type has been formally defined in 1964. I wonder how LISP got away without defining a pointer/reference type formally until that. The canonical tier 3 language intended for teaching is Pascal, and there are blocks, data structures, and pointers.

I understand that it is hard to imagine in 2019, but such things did exist. I even written some programs in BASIC in the school. In University we have used C and Pascal mostly, and FORTRAN for some linear algebra tasks. Understanding pointers and recursion is actually a major enlightenment on the path of programming. Lather vesions of BASIC were converted to the tier 3 langauge too.

If I have understood correctly, then we have primitive types on tier one, simple records and arrays on tier two that cannot reference other records (and require joining as in relational algebra), we have constructions of objects (things like trees and other multi object data structures) on tier three, and abstract data types like modules, or type-classes (API contracts) on tier four.

This is more or less true, just to clarify, on the tier 3 we have white box structures like C structs/unions. This is also a data dimension. On the code structure dimension, we have the same: tier 1 – pressing buttons on device, Tier 2 – sequences, go to, Tier 3 – block structure and procedures, Tier 4 – lambdas, virtual methods, interface calls.

Generally, in any case where something grows quantitively and human brain has to understand and manage it there is the same sequence of adopting abstraction tiers. For example, for organizational structure: the tier 1 – individual, the tier 2 – flat organizations (or teams within organization), the tier 3 – hierarchical organizations with white box structure (Army, Church, medium business), the tier 4 – big corporations that consists from mostly autonomous business units (Sony, IBM, etc.). Government concepts are the same: the tier 2 – flat tribes, the tier 3 – monarchy, the tier 4 – separation of powers.

How would parametric types and templates fit into this hierarchy, as they clearly advance abstractive power?

The functional languages have somewhat problematic type system. The parametric types are the tier 4 type constructors, because we define type w/o knowing the actual parameter type (so here comes black box concept).

However, the algebraic data types are the tier 3 concept, because the union is closed and we know what data could happen here. This is actually a problem because of it. For example, if we define AST using ADT, then to add new type of node we have to modify original definition, so we could not extend structure in a plugin. Contrast it with Java or other OO languages, where we have AST node interface, and plugins could provide another implementation of that interface. I think the lack of open unions is major problem in Haskell that greatly reduces dynamic extensibility of Haskell applications, I heard that there was an effort on fixing it, but its current status is unknown to me.

Re: Hierarchies and Pointers

I think there's a significant conceptual difference between "hierarchical objects" (like branches of a tree, rooms of a building, or paragraphs of a document) and "referenced objects" (via pointers, URLs, room numbers, or content-addressing by interning or secure hash).

A 'Tree' would usually be implemented via pointers because that's efficient. But we could imagine less-efficient implementations involving deep copies. The important bit is that, without references at the abstraction layer, there is no logical aliasing, no cyclic dependencies, etc.

Via fixpoint combinators, we don't need references even for general recursion. Not even implicit namespace refs. Black boxes - such as existential type values - also don't strongly imply references. Only "shared state" seems to truly need references as a concept.

Of course, any sufficiently powerful computation system can model references (e.g. integers addressing a key-value trie that represents the state of our address space).

I'm hesitant to consider "pointer" a "fundamental primitive" of most theoretical computation models or systems. OTOH, pointers are fundamental to most physical computation models, so come up a lot when compiling.

Pointers/references are fundamental for the tier 3 and above

I think there's a significant conceptual difference between "hierarchical objects" (like branches of a tree, rooms of a building, or paragraphs of a document) and "referenced objects" (via pointers, URLs, room numbers, or content-addressing by interning or secure hash).

The basic idea of hierarchy is replacing a part of the pattern with its representative. The representative is a pointer in general case. If we look at real-word hierarchies on which our brain is trained, we do not copy people or subdivisions when we are working with them. The same is with black boxes, but they add another level of indrection via contracts. We refer to mutable objects via labels/names and these names are local to teams. For example, a team lead is a local reference within team to the specific person. When we want to give it order or bonus, we do not expect the person to replicate on the spot and safely work with copy.

A 'Tree' would usually be implemented via pointers because that's efficient. But we could imagine less-efficient implementations involving deep copies. The important bit is that, without references at the abstraction layer, there is no logical aliasing, no cyclic dependencies, etc.

Note that in this case we still have a name dereferencing. The important thing is that on the level of the parent object, the part of the content is packaged into separate entity and that entity is referenced by representative. On the previous tier name could refer to primitive values. This is a fundamental cognitive operation used by the tier 3 constructs.
Note, pointers to primitives are no exception as they are just referencing to anonymous structure that contains a single primitive value.

So, it is just a pointer that resolves to embedded/collocated object, it would be just a memory access optimization. IMHO it is even not possible to create acyclic graph w/o pointers. Only tree is possible.

Also, if we have to support unknown depth structures, we could not use field offset, we have to use pointers internally to store locations of embedded objects.

Possibly better name could be used instead of “hierarchies”, but I have not figured one. J. Piaget’s version “concrete operations” is too abstract and it does bring any pictures to mind. One has to grind through the text to understand what he meant. “Hierarchies” is more visual, but apparently causes confusion as well as strict tree hierarchies w/o horizontal connections are imagined.

Via fixpoint combinators, we don't need references even for general recursion. Not even implicit namespace refs. Black boxes - such as existential type values - also don't strongly imply references.

In case of black boxes, we generally do not know what we are referencing by the name. Fixpoint operation creates a structure of the unknown memory size and unknown internal structure. I also have a trouble of imagining how it could be implemented without pointers.

Also, all attempts (known to me) at the languages with purely immutable structures were at the tier 4. This hints that black boxes are the minimum tier for hiding shared state. With white boxes at the tier 3, the shared state would have been exposed.

Only "shared state" seems to truly need references as a concept.

The languages w/o shared state are currently an exception. And I think there is a reason for it. While there are easy to reason about non-changing state within these languages, it is more difficult to reason about evolving state in these languages as state change happens in them globally (we need to replace not only one object, but also all referrers to it, so they will refer to new version of state), rather than locally. So, I think that shared state is a fundamental, and immutable state is a restricted case that is easier for reasoning in some cases. Attempts to reduce mutable to immutable were mostly unsuccessful as emulation of mutable over immutable creates higher cognitive load than saved by reasoning about immutable state later.

Newer languages like Scala and Kotlin support immutable and mutable structures. Even Haskell has two sublanguages, one is purely functional language. Other is IO-monad-based DSL that look just like typical off-shelf imperative language with few quirks.

Of course, any sufficiently powerful computation system can model references (e.g. integers addressing a key-value trie that represents the state of our address space).

I'm hesitant to consider "pointer" a "fundamental primitive" of most theoretical computation models or systems. OTOH, pointers are fundamental to most physical computation models, so come up a lot when compiling.

Idea of the pointer that we resolve a name to the group values (or to location of single value as a trivial case). This is a fundamental cognitive operation. And programming language design follows cognitive operation structure. For computers, it would have been simpler if we just entered machine code. All these programming languages were created so we could reason about all these actions in reasonable way. And the higher abstraction tier, the larger number of elements that could be reasoned about. This is a generic human way to deal with complexity.

So, I think that the pointers/references are fundamental for structures of the tier 3 and above.

Directed Graphs

Hierarchies definitely mean tree-like structures, where objects can only exist at one point in the hierarchy. Hierarchies can be implemented only using containment, and can be represented as a string using parenthesis as in "top(left(bottom1, bottom2), right(bottom3))" no pointers necessary.

It seems to me you mean "Graphs" or "Directed Graphs", which can be DAGs (directed acyclic graphs) or just directed graphs in the case where cycles are permitted.

Hierarchies are on the code level

The tier 3 is named 'hierarchy' because hierarchy is the key complexity management tool enabled by this tier. Hierarchy concept leak everywhere as it is highest level complexity management tool available there.

The pointer path a->b->c->d in C assumes local hierarchical view of the world in the term of accessibility tree from the current stack frame. Such memory model supports non-hierarchical data structures, but they worked with using hierarchical projections. Call tree is a strict hierarchy (unless we play with longjump or other funny things).

The problem usually starts when the tier 3 languages work with constructs that are not strictly hierarchical. For example, there almost always an implicit global hierarchy of memory objects in C program in the form of 'owning' pointers. The biggest problem when programming in C is separating these two kinds of pointers. This is because from the pointer syntax the both kinds look the same and local hierarchical projection does not match global ownership structure.

The most of the tier 4 languages just skip this problem with some form of garbage collection as addition of black box constructs makes memory tracking almost impossible.

The situation is not unsimilar to using integers for memory access in the tier 2 languages. We have a complex linked structure in the mind, but we have to use flat namespace of integers to implement it. Here we have a complex graph, but we are using hierarchical projections of it work with it. It is better than flat namespace, but still causes problems.

Directed Acyclic Graph

I think I would prefer the term Directed Acyclic Graph for this then.

Turing Machines

Pointers seem fundamental to the physical implementation of computation, even if not required for abstract computation. I started programming in BASIC on a Commodore-64, so I am familiar with using arrays as the sole complex data structure, with indexes being used as a pointer into parallel arrays (as without structs you needed one array for each "object" property). If you treat each set of parallel arrays as an object-pool for a particular type of object, then you can program as if there were pointers, but I agree it seems a less powerful abstraction.

The point I was making about them being fundamental was that I could not easily imagine a way to make a working physical general purpose computer without pointers. Even a Turing machine has the tape pointer you can manipulate.

Re: Turing Machines

AFAIR, Turing Machine did not use term pointer. It use term 'head' (like in the tape recoder) rather than poiner.

Fundamental is indexed memory access operation (TM head supported shift left/shift right operations over tape). Machine code does not has types and it just uses number in registers an memory. Most of modern CPU have an IP (instruction pointer) register, but this is just integer. The tier 2 languages retricted access to the memory to some restricted areas using arrays and usually provided some escape hatches to access memory by address.

The pointer is a high level costruct that restrict such number usage to indexed memory acces and specifies what we can expect at the location. The invention was that such thing is actually typed.

In the version of BASIC that I used, there were no poiner type and number could have been used for indxed memory access using some read/write memory functions (I do not remember their names already, it was used mostly for graphics). If language does not have a pointer type, a programmer has to remember meaning of the number and how to work with memory at that address in the mind.

Pointers or References

Yes a Turing Machine does not explicitly use the term pointer, but it is a pointer. I would use the term 'Pointer' for the kind of things in a CPU, after all we have 'InstructionPointer' and 'StackPointer' being well used in the literature.

What you refer to is what I would call a 'Reference' something that is type safe, and you cannot do address arithmetic on it. 'C' has pointers not references (you can treat them as integers and do arithmetic on them), a reference is more like an Ada access variable.

Re: Pointers or References

It is a pointer in the mind rather than pointer in the program. Turing machine language does not have a concept of pointer at all. It has concept of the code state and of the current head position. CPU do not have types. They only work with numbers. And some numbers are interpreted as addresses in memory by some instructions. This is a radical difference.

The generational changes in the languages happens when you start writing something new, rather than when you start thinking something new. Usually, the programming community start thinking something new long before than it is supported by programming languages.

The programmer compiles the pointer concept to these low-level concepts when working with assembly language.

Also, for instruction pointer, I do know when this term has appeared. Instruction pointer is one of synonyms for program counter. At the time of appearance of x86, the pointer concept was long known history, so it might be just explanation of lower-level concept via high-level.

Stack register is also a quite recent history. It appeared in late 60s with appearance of the tier 3 languages that supported general recursion. The stack concept appeared early, but still after first generation, so it is useful, but not essential. Even now, some RISC CPUs have stack register assigned by convention rather than in hardware.

I guess to understand whether the concept of pointer is fundamental we need to delve into works before 1964. I personally doubt that we will find wide usage of the term.

Also, generally speaking, the reference is more generic concept than pointer. It includes pointers and other ways to reference things (for example, by key in relational algebra as reflected in the concept of referential constraint). Another example is C++ reference, which is just another syntax for pointer, and with the same integrity problems.

CPU do not have types. The

CPU do not have types.

The floating point and word registers are distinct CPU types. You can't pass a floating point register to an addressing instruction, for instance.

double post

removed

CPU do not have types, instructions make assumptions about data

From Intel's manual:

The fundamental data types are bytes, words, doublewords, quadwords, and double quadwords (see Figure 4-1).
A byte is eight bits, a word is 2 bytes (16 bits), a doubleword is 4 bytes (32 bits), a quadword is 8 bytes (64 bits),
and a double quadword is 16 bytes (128 bits). A subset of the IA-32 architecture instructions operates on these
fundamental data types without any additional operand typing.

So only difference between fundamental CPU datatypes is operand size. These bytes do not have intristic properties on CPU level. Different instructions just place different restrictions on acceptable byte structure. But such interpretation happens only in the context of the instruction. There is no any cross-intruction verification.

Also SSEx and x87 registers could hold both integers and floats (FI* instructions).

It is possible execute FP instruction on SSE register and than to work with result as with integer and move it to main CPU register and than to use it as an address.

For addressing on x86 architecture also not all main registers could be used for all roles. For example, AMD64 R8-R15 could not be used as well. They have to be moved to other registers. x86 as CISC architecture has many funny restrictions on instruction operands, and making some type system ideas about them is a bit naive. This is one of the reasons why LLVM is so popular for compiler writing recently. LLVM provides much simpler and uniform instruction set and hides these inconsistencies.

Types are logical

Types are logical propositions about a program, yes? You've just listed a number of propositions about x86 and x86-64 instructions, where certain registers are allowed or disallowed as operands to certain instructions (as but one example). The fact that all registers are not interchangeable across all instructions means there's a type distinction at work.

These might be pretty weak properties, but it seems perfectly fine to say these instructions have typed operands. The fact that Intel manuals don't call these types doesn't seem particularly meaningful.

Compare these CPU restrictions to the untyped lambda calculus, where there is no such thing as invalid arguments.

Addressing expressions vs types

The fact that all registers are not interchangeable across all instructions means there's a type distinction at work.

No, it is not a type distinction, it is instruction syntax distinction. There is no binary syntax (combination of flags) supported to use needed registers on some instructions. For example, MOVQ2DQ support only moving between SSE and MMX registers, it does not have a syntax to work with memory or general purpose registers. But this fact does not type SSE or MMX registers with respect to MOVQ2DQ instruction.

The DIV/IDIV instructions are other example. The first implicit argument has to be RAX for 64bit case. The result is stored into RAX and RDX. There is no encoding of the instruction that allowed to use other registers for these roles.

If we consider registers as specially addressed memory, then not all instructions have argument encodings that support all memory locations. Some instructions will work only with hardcoded locations, and they could not address other locations.

Interpretation of the register values is also instruction-dependent. IDIV interprets arguments as signed integers, DIV as unsigned. Registers itself just hold bits.

LLVM is really a much simpler model as it hides all this hell from the compiler developer.

No, it is not a type

No, it is not a type distinction, it is instruction syntax distinction.

As per Pierce's definition:

A type system is a tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute.

So type systems are syntactic methods, and "kinds of values" is flexible enough to encompass the restrictions you list.

"Types" don't just mean some kind of "shape" or "structure" of data, as that only scratches the surface. CPUs don't have type distinctions in this sense, but instruction operand restrictions can readily be seen or expressed as type distinctions.

But this fact does not type SSE or MMX registers with respect to MOVQ2DQ instruction.

Rather, the MOVQ2DQ instruction accepts only SSE and MMX register types as operands.

Re pointers fundamental to physical computation

I disagree. We have plenty of examples of computation with slime molds, soldier crabs, dominos, and bees that don't involve pointers. It's feasible to develop compute hardware for digital or analog signal processing that doesn't use pointers. The Babbage machines don't use pointers. There's a whole field related to DNA computing and molecular programming, which doesn't involve addressable memory. Etc.

But pointers are fundamental to "addressable" memory models. It's essentially right there in the name. I don't forsee us breaking away from those any time soon, except for specialized use cases.

Even a Turing machine has the tape pointer you can manipulate.

In my understanding, pointers are first-class values that we use to index an external resource. A Turing machine's 'head' is certainly not first-class, clearly not a value, doesn't index a tape so much move along it, and no, you cannot manipulate it - it moves only by the machine's own rules.

In any case, if your definition of "pointer" is too broad and vague, you aren't saying anything useful when you claim it fundamental.

These platforms are too low-level

I disagree. We have plenty of examples of computation with slime molds, soldier crabs, dominos, and bees that don't involve pointers. It's feasible to develop compute hardware for digital or analog signal processing that doesn't use pointers.

Strictly speaking, transistors do not have pointers either, but high-level languages defined for computers built with transistors do have pointers. Let's discuss the issue when high level programming languages over these computing platforms will appear. Something tells me that there will pointers for sufficiently high tier programming languages just because pointers are useful complexity management tool used by humans. Honestly, I'm not that impressed by bees as computing platform, as I do not understand how to add numbers using them, not speaking of more complex tasks like high-frequency trading or building online forum.

Fundamental means it's part

Fundamental means it's part of the base, the foundation. It's exactly the essential "low level" things that are fundamental. Anything we build above the foundation is, by definition, not fundamental.

Physical computation doesn't build upon pointers. We might build pointers and addressable memory upon other physical computation primitives. Pointers are a high level construct in terms of physical computation. Therefore, pointers aren't fundamental to physical computation.

Whether pointers/references are fundamental to a language depends mostly on whether they're provided as a primitive (like in C or Java) or a library (like IORef or TVar in Haskell). So deferring discussion until "high level programming languages over these computing platforms will appear" isn't especially conducive to understanding the platform itself.

Fundamental is relative

The question is where to stop when seeking foundation and what logical layer we are discussing. For transistors, the foundation is atoms and electron movements. For atoms - elemental particles, for elemental particles quarks or whatever physicists fashion now.

If we stay within logical scope of the tier 3 programming languages (ALGOL, C, Pascal, etc.) - then pointers are fundamental. If we consider wide context of programming languages, the pointers are compiled to address arithmetic when we convert the tier 3 representation to the tier 2 representation. The tier 4 OOP and FP programming languages using pointers and references as well. So, whether pointers are fundamental or not depending on the context. Consideration of bees widens context too much IMHO.

The modern software development happens in the context of von Neuman architecture and its derivatives. Running out this in the context of programming language discussion is possible, but without languages or compilers it does not make much sense. The only viable alternative to this architecture are analog computers, that are practically dead except for few niche applications.

While lambda calculus is useful for proving theorems about foundations of computing due it its high level, it is actually too high level to be considered as foundation model of computing. von Neumann found a good place for the boundary between software and hardware and we are still using it.

re fundamental is relative

I agree that what is "fundamental" is relative to our subject. Like, in martial arts, the "fundamentals" are the basic movements. In a programming language, the "fundamentals" are the built-in features and the syntax surrounding them. For physical computation, I'm not sure what the "fundamentals" are, but I'm certain it's well below the level of 'pointers'. Likely candidates are space, matter, and motion.

modern software development happens in the context of von Neuman architecture and its derivatives

I agree. I'd also agree that addressable memory and pointers are fundamental to von Neumann architecture.

If we stay within logical scope of the tier 3 programming languages (ALGOL, C, Pascal, etc.) - then pointers are fundamental.

Modulo safety and philosophy, ALGOL, C, and Pascal are pretty much the same language - imperative structured programming. You cannot effectively generalize to other tier 3 languages if this is your sample set.

Consider VHDL, SAFL, circuit diagrams, various box-and-wire languages. Some of these may involve static memory, and the "compiler" might produce instructions for printing a circuit board. Whether they interface with or utilize addressable memory units internally could be optional, rather than fundamental.

Consideration of bees widens context too much IMHO.

IMHO, your PL horizon is limited, and this has led you to some premature generalizations.

While lambda calculus is useful for proving theorems about foundations of computing due it its high level, it is actually too high level to be considered as foundation model of computing.

Lambda calculus by itself is relatively low-level. For example, you won't even have natural numbers and lists without a Church or Mogensen-Scott encoding thereof. But it's also general, Turing powerful, so we can indeed encode those things. That said, I favor a combinatory logic because it elides those pesky variables and sophisticated substitution semantics, and is thus even lower level.

In any case, I don't feel like arguing about what makes a "good" foundation model of computing. But I can already tell we'd disagree, and that our priorities would be different.

FPGA challenge

There are plenty of free tools out there that can be used to program FPGAs. My challenge would be to design a general purpose stored program computer that does not have pointer registers of any kind. I am not aware of any, this supports the position that pointers are fundamental to a "stored program general purpose computer". If you can't find a literature reference, and cannot design one yourself, then I don't see any realistic challenge to the statement that pointers are fundamental to stored program general purpose computers.

My challenge would be to

My challenge would be to design a general purpose stored program computer that does not have pointer registers of any kind. (...) pointers are fundamental to a "stored program general purpose computer"

A pointer must be a first-class value within the program. So I don't believe that the second-class 'stack pointer' or 'instruction pointer' at the CPU generally qualify as pointers. They would if our language provides means to store and set them. But not generally.

You have a broad and vague definition of pointers that even includes Turing machine 'heads'.

But by my definition of pointers - which I consider vastly superior to yours - the challenge isn't difficult. Even using an existing CPU to repeatedly evaluate a stored, Turing complete, cellular automaton or simulate a Turing machine with the memory as tape, would be sufficient.

We could feasibly create a dedicated processor that does the same, or even a memory-chip that automatically computes a cellular automata (all cells at once).

So here's the challenge for you: explain your 'definition' of pointer and convince me it's better than mine.

Pointers

A pointer is something that 'points to' something else. A tape head clearly points to a position on the tape, which is the current location. With tape we can only increment and decrement the location (no random access), but we have already established that it is not necessary to be able to do full integer arithmetic on a pointer.

Any computer that has memory addressing will by definition have pointers, even if that is just the program counter in the CPU.

So to have a computer without any pointers would require not having a stored program in sequential memory locations. How could we have a general purpose stored program computer without memory addresses? (Referring back, I do not contact consider DNA or slime to be general purpose - they can solve constraint problems and equations, but you can't play Quake on them).

https://yager.io/HaSKI/HaSKI.html

Here is an example project implementing SKI combinators on an FPGA. It is of note that because the state is of variable length, the only option is to store it in RAM and access via a memory bus, which results in having to use memory pointers instead of values. It seems to me that any attempt at actually _implement_ a combinator system in physical reality would necessarily need pointers.

Re pointers

In every language with pointers I've used, a pointer is a first-class value. We can keep pointers in data structures, pass them as arguments, return them from expressions.

So, unless we can do all these things, via machine-code accepted by the CPU, then it doesn't qualify as a pointer. Not even if it's a program counter.

Thus, to answer the question about whether pointers are needed for stored program computing, it's sufficient to ask whether we can craft a usable 'machine code' cannot directly use addressing.

I believe we can. Turing machine is a simplistic example.

I think your definition of 'pointer' is invalid, because it includes second-class things that do not behave as the prototypical C pointers.

Internal Pointers

My point is that you need pointers to implement a CPU. If you read a stored program from memory you must have a memory address to read from. This is what I mean when I say fundamental: you cannot build a CPU without the concept of a pointer. I am not talking about the language built on top of that. It is obvious that we can have a CPU that interprets lambda calculus as its machine code, so you we don't need to have that discussion.

Of course a CPU register that stores an address is a pointer. Even if it is not accessable to the programmer it still exists, it is fundamental because without it there is no CPU.

That is why the challenge was to design a CPU that can be implemented on an FPGA that does not use pointers in its construction. The machine code exposed to the programmer is irrelevant because this is already an interpreted language implemented on the CPU, so it could indeed be SKI combinators (see example I linked above).

I am not saying pointers are fundamental to programming, again the existence of lambda calculus is enough to dismiss that as reasonable. We clearly can have many programming languages without pointers, and yet to provide a computing platform that can execute/evaluate those languages we need pointers. Having a pointer less machine code is simply building an interpreter into the CPU, and yet the CPU itself will need address registers.

There are some architectures, like data-flow, that can exist as stream processing nodes that you arrange as a graph, however the limited number of processing nodes, and the problems of routing data mean that these are not general purpose.

My whole point was you have all these layers of abstraction, but in the bottom one, the actual CPU itself, the hardware, you need pointers. That is what makes them fundamental, foundational even.

re internal pointers

A pointer is a language construct. It's part of a language such as C or x86. A pointer is a first class value within the language. We can store it in data structures, return it from computations, etc. If we can't do these things, it is not a pointer. Full stop.

I'm not happy with this "internal pointers" name. I think it's misleading. But I can't offer a catchy alternative right now, so I can go with it.

Your claim is that we minimally need internal pointers for a CPU to operate on a stored program computer.

I will reject this claim on two different levels.

First, I reject that a CPU is essential to Turing complete computation with a stored program. As I mentioned in passing before, we can move computation into the memory itself. E.g. on every clock tick, every cell of memory could rewrite according to its own state and that of certain wired-in neighbors. Cellular automata already demonstrate that such computation is easily Turing powerful. We could develop a language around such a CPU. (Probably awkward to use, but that's irrelevant.)

Second, I reject that a CPU needs to know "where it is" in memory. Without this knowledge, we instead have a CPU as a stream-processor (no jumps!). In each 'round' the CPU operates on the entire memory as a stream, and outputs a new memory stream. We could shuffle data between banks of memory in each round, performing simple, local rewrites in each step. Terminating when there is no change. Even with simple rewrite rules, this can easily be Turing powerful. It's also likely to be much more accessible to human programmers compared to the first option.

In these cases we do not need 'addressable' memory. Storing the program and processing it in full-memory rounds is sufficient. Essentially, what we lose without 'internal pointers' is only the ability to represent split attention on multiple parts of memory. Thus, without such pointers, behavior and data will be combined, and performance of computation that requires split attention (binary operators on unbounded-size data) may suffer.

This has some implications for efficiency, but also introduces implicit opportunities for parallelism and implicit codata. That opportunity should be obvious in the first case. In the second case, we could pipeline a stream across 100 simple processors for bigger steps. And we could operate on different regions of a stream if we either design for rewrites to be confluent or accept some non-determinism and leave confluence to the programmers.

Could be a nice alternative to what we use today.

Edit: I like the idea, so I'm now developing a streamable rewriting language for bounded-memory processors. Might blog it later.

re Internal Pointers (follow up)

Okay, I've designed Streaming Language Rewrite Processors (SLRP) as an alternative to the von Neumann architecture and addressed-memory computing. An SLRP processor will rewrite an input stream. The stream itself serves as a 'memory' for future passes, but is not addressable. Further, I developed a concrete foundation for a SLRP machine code based on a concatenative combinatory logic, and I sketched out a few useful extensions.

A weakness of pure SLRP is that there is no pass-by-reference within the stream. Moving large data is more expensive than moving pointers around. It is useful for performance reasons to integrate external storage devices and references. However, Turing complete, general purpose computation does not depend on such extensions.

The performance potential is better than I initially imagined. A processor can leverage internal buffers to reduce data-plumbing costs and have a larger 'sliding window' over the input stream for tight loops. Processors are easily pipelined. And it's feasible to partition a stream across several different groups of processors.

I wouldn't have bothered to spend a week on this, except this result is quite applicable to my Awelon language project. So I'll remember this as one of those exceptional cases where 'argument on the internet' has a useful outcome.

Sounds nice

Sounds nice, would be great to see some of your work on this. I guess I don't understand how you address within the window without a pointer? Also how do you deal with branching? If each processing node only handles a single instruction, then it's going to require a lot of silicon to do quite basic things? If each processing node handles several instructions on the stream it's going to require a program counter (pointer).

When I worked on a streaming processor design back in the 1980s if you wanted to avoid all memory addressing you had to use routing instead, each processor was a node in a hypercube graph, and you can input or output a stream from each of the possible links. For complex algorithms routing became a problem, as you had to get data streams from one port on one processor to another port across the hypergrid, and it had all the difficulties of place and route optimisation for digital circuit design.

It would be great to see how you solved these problems.

clarification

Also how do you deal with branching?

Church encoding works well for branching.

If each processing node handles several instructions on the stream it's going to require a program counter (pointer).

No. A program counter refers to a program stored in an external, addressable memory. With SLRP, the program is what we stream through the processor. There is no external instruction on how to handle the stream.

Further, even internally, a processor just chooses *any* available rewrite, outputs the stream up to that point, performs a rewrite, repeats. Any order of rewrites is okay. No need for a consistent instruction pointer even within an internal buffer.

I guess I don't understand how you address within the window without a pointer?

Sliding window is just an optimization: buffer some output internally (so we haven't committed it to observable processor output), and keep it available for more rewrites. No need to "address within the window".

When I worked on a (...) hypercube graph (...)

The SLRP architecture is much simpler. Just one input stream (of a 'machine code' language with term-rewriting semantics) that we rewrite to one output stream. No routing needed, or even possible. The stream contains both behavior and data. The output may vary in size from the input (e.g. due to copy and drop). The output stream might need to be processed again many times, but that's outside the processor's control. Besides this, we may have side channels and effectful rewrites. But those aren't essential.

Being limited to one stream in one direction does hurt performance. I've been contemplating higher dimensional variations on SLRP. I have vague ideas about some form of bi-directional stream rewriting, or even multi-dimensional models. But nothing solid yet. There are several semantic issues I haven't figured out.

Single Reduction?

So mixed data and program get fed through the processor which outputs a "reduced" form of the data and program? Presumably this means the processor only performs one reduction/rewrite on the data/program as it passes through? You would then need some kind of external storage? Something like the old tape based batch processors where you play the stream from the source tape through the processor and store the result on an output tape, which you then move to the input side and repeat with be new output tape. You probably want at least two inputs and two outputs as merge and split were fairly common operations.

So mixed data and program

So mixed data and program get fed through the processor which outputs a "reduced" form of the data and program?

Yes. That's right.

Presumably this means the processor only performs one reduction/rewrite on the data/program as it passes through?

A minimalist implementation of the processor might do at most one rewrite per region of code, having no output buffer. Not quite same as per pass: given `+(8)+*(4)(7)*(3)(2)` we'd opportunistically do both multiplications in one pass, but then the following two additions would take two more passes.

But we're also free to 'buffer' a portion of processor output. If we haven't fully written to the processor's actual output port, we can take another pass internally.

For example, take a simple stream like `+*(13)c(3)`. We can't do anything with `+*` so we just write `+`. We can't do anything with `*(13)c`, so we write `*(13)`. Now, `c(3)` we can write `(3)(3)`. All we can do without buffered output.

But if we've buffered our output, then our output buffer contains `+*(13)(3)(3)`. We can replay it, handling `*(13)(3)`, and our next output buffer is `+(39)(3)`. Repeat again to get `(42)`. We can't do anything more with `(42)` so we commit it to our processor's "real" output.

Or if the output buffer is only 11 characters and we had to flush the '+' early, our output might be '+(39)(3)'.

In the general case, a processor with an output buffer that's larger than a loop expression could perform an unbounded number of reductions before producing a single byte of output or reading more input. Infinite loops are a concern. But we also could push some output early if a downstream processor signals starvation, or if we hit a heuristic effort quota and just want to move on.

You would then need some kind of external storage?

The stream itself is a form of external storage, yes. Your challenge did explicitly require a "general purpose stored program computer". However, the stream is not 'addressable' storage. Nor do we need addressing and pointers to process it.

Intriguingly, it doesn't need a stable storage location. For example, if we have a stream that varies between 2-4MB in size, we could circulate it round-robin through a dozen processors whose total buffer sizes are above 4MB. Via effects, these stream processors could drive attached devices until the power runs out. (Though, for convenience, we might get started by having one processor in the circle read the initial stream from a ROM.)

You mention use of tapes, streaming between a pair of tapes. That's perhaps the most simplistic and inefficient way possible to use SLRP systems. But, yes, that works.

If you pipeline a stream through ten processors before storing to another tape, you'd multiply efficiency by almost ten.

You probably want at least two inputs and two outputs as merge and split were fairly common operations.

Two inputs and two outputs are indeed a variant I've contemplated. Both unidirectional and bidirectional variants. But again, I have not found a semantics that works nicely for deciding where to interact between streams. And by 'nicely' I want confluence up to effects, easy abstraction, etc.. I'm contemplating a few explicit rendezvous models.

Merging

A classic example to think about for two inputs is where one is a set of bank accounts, and the other is a set of transactions. Bank accounts would be stored in account number order, and you can pre-sort the transactions into account/time order.

However I am not sure how you would achieve this with combined data and program? As you say there would need to be some kind on rendezvous on account number, as you can't rewind the stream, I think it's safe to assume both are sorted. You would give control to the transaction stream that would then 'wait' on the account number in the account stream.

How would you combine the two program fragments from the different streams? I guess you would "apply" the transaction to the account.

So the rendezvous would basically be "wait and apply" wait for a rendezvous in the other stream with a matching ID and then apply the following expression to the expression following the rendezvous in the other stream?

Re Merging

Representing a merge of lists or trees in SLRP isn't difficult - it's Turing complete and reasonably expressive. However, in pure SLRP it would be slow because it requires a lot of swap operations, which are inefficient for large inputs in SLRP (because no pass-by-reference).

We could solve performance via impure SLRP, e.g. using virtual-memory 'references' for volumes of our database and transaction logs.

The other possibility is to extend a pure SLRP to efficiently work with multiple streams. However, I really don't want to handwave about whether this is feasible with nice semantics. So I won't answer your questions about rendezvous semantics.

Efficiency and Generality.

I don't think efficiency is really a concern. I don't think this kind of approach would be efficient for any hardware, in reality you would just get a CPU with embedded USB and stream over that, implementing the language with an interpreter.

I certainly am interested in how you resolve this, it's an interesting topic.

Returning to my point about pointers, is this really a general purpose architecture? On the one hand if it's Turing complete it should eventually be possible to port Quake to the architecture. On the other hand, if it's not a "natural" fit for the architecture is it really general purpose?

re efficiency and generality

If natural fit for all programs is a requirement, then an architecture for general purpose programming does not exist. Prove me wrong. :D

I'd rather not quibble definitions of 'general purpose' and 'natural fit' and other moving goal posts ad nauseam. I would consider SLRP to be just as general purpose as a von Neumann architecture, which has more than a few of its own awkward aspects.

I still have several ideas for improving efficiency. I'll be experimenting with them this week.

Abstract Pointers

Even with this architecture there is going to be a stream pointer, that is the current symbol, which is like the head of a Turing machine pointing at a particular point on the tape, which I mentioned earlier in this discussion.

In effect a stream is an uncopyable forward iterator. An iterator is an abstract pointer. So there are still pointers in this architecture, from a certain point of view.

Nope

You stretch the word 'pointer' far beyond what most people mean when they talk about pointers in programming or their properties.

like the head of a Turing machine pointing at a particular point on the tape, which I mentioned earlier in this discussion.

Did you miss the part where I refuted you earlier in this discussion? Turing machine head isn't a pointer.

It Points

The head of a Turing machine clearly points at something. Without it how do we know what the current symbol is? A pointer is something that indicates a position in something else, like an index. A memory address is just an index into an array of bytes after all. I am not talking about a "pointer type" as you might define in a type system, that is an artifact of a programming language, not the computational hardware on which it runs.

How can you have a Turing machine without something to point at the current symbol?

It has a location

Having a location is not the same as pointing at a location. And more importantly to 'pointer' in context of programming, the Turing machine head's location is not a value that we can record or communicate within the Turing machine system.

Turing machine can be understood as a simplified form of mobile agent systems. In mobile agent systems, we have stateful 'agents' that move through a stateful 'environment', sensing and manipulating it according to some rules. (The exact details can vary, e.g. we could model sounds, pheromones, rooms with inventories, distributed physical computers, etc..) In the general case, the agent does not know its location in the environment, it can only detect its environment via sensors and remember its prior observations and actions. With a Turing machine, we only have one simple agent - the head - and our environment is an infinite tape of symbols.

Mobile agents have a location in the environment, but they are not "pointers" to the environment. This is perhaps most obvious in the general, open distributed systems case where you, as a human, are implicitly part of the mobile agent system. You are not your address. And it's a coordinate such as street address, GPS, URL, etc. that you'd compute, record, or communicate as an address, reference, or 'pointer'.

A Turing machine head is not a address, reference, index, or pointer. It's just an agent with a location in the Turing machine system.

We humans could use a phrase like "current location of the Turing machine head" as a reference. But it would be an error to confuse or conflate our external use of such references with the Turing machine's use of references. It would also be an error to confuse the reference (the quoted phrase) with the actual Turing machine head or its location. You're committing both of these errors when you insist that Turing machines use pointers because, from your external POV, the head is a pointer.

A pointer is something that indicates a position in something else, like an index.

True, but insufficient. Addresses, references, indexes, pointers, etc. also have other important properties, such as being language constructs that we can compute, record, or communicate.

At a fundamental level, the concepts of references, addresses, pointers are tied to language - the representation and communication of information. The existence of space and locations is not enough. Addresses only exist as part of a system for thinking or talking about location within a space. In programming, pointers would be constructs in a programming language. In hardware, pointers would be constructs in a machine code language.

Agents?

This is the difference between the observed and the observer. The observed thing has a location. The observer has a point/index/focus. The observer is not the thing it is observing.

A Turing machine head is not a address, reference, index, or pointer. It's just an agent with a location in the Turing machine system.

All this seems a bit vague and hand-wavy. Yes its very east to talk about agents, but agents don't actually exist in hardware. Turing machines don't actually move. So in an abstract theoretical sense I agree with you. But in a concrete actually implementable sense I don't. Transistors do not move about on the surface of a microchip.

re Agents?

This is the difference between the observed and the observer. The observed thing has a location. The observer has a point/index/focus. The observer is not the thing it is observing.

You're missing an important aspect, here: There is no observer of the Turing machine head's location within the Turing machine system.

It is illogical to argue that "I, Keean Schupke, use the current location of the Turing machine head as a pointer, THEREFORE, the Turing machine uses pointers". Because the latter statement has nothing to do with your behavior. You aren't part of the Turing machine.

When we ask whether the Turing machine head is a pointer, the proper full question is, "is the Turing machine head a pointer within the Turing machine system?" Based on the common definition of word 'pointer' in computing as a value representing a location, the correct answer to this question is: no.

If your definition of 'pointer' is a hand-wavy 'anything that points' then the correct answer becomes: stop the bullshit.

agents don't actually exist in hardware

Humans are already mobile agent hardware. They just are rather difficult to program, and even more difficult to debug after they go wrong. :D

Turing machines don't actually move.

Turing machines don't actually exist. Well, not outside of partial toy examples developed for educational purposes (1).

For practical manufacturing, we must constrain the hardware requirements: fixed memory, fixed number of IO channels, preferably no moving parts larger than sub-atomic particles. We widely use von Neumann architecture today in part because it's practical for manufacturing. But von Neumann architecture certainly uses pointers - addressable memory for code and data is an essential part of the definition. SLRP was designed as an viable, pointer-free alternative to von Neumann architecture.

There are other Turing complete stored program machine models we could implement easily in hardware. Cellular automata among them.

Pointers vs. addresses

Let's separate pointer and address/index (or location).

Addresses and indexes were well known at start of computing. What distinguish pointer from address is that pointer is typed address/index (explicitly or implicitly), but pointer != address because it carries additinal information. The pointer is implemented as address. The idea is not so obvious and was only invented at 1964. The pointer is a notation thing. Address is implementation thing.

In many places these could be used interchangely, but on notation level they should be clearly distinguished. With address we have get_int(int_address), with pointer *int_ptr. The distinction is that we had to guess that at int_address location an integer is stored, but with pointer type we have asserted the fact that integer is stored at location of int_ptr somewhere before, so this information is carried on notaiton level and it could be typechecked. On machine code level the code could be the same if 'get_int' is an intristic.

Pointers are a great invetion of the tier 3 language design since we could separate out subset of data and describe it, then we could separate part of behaviour and make it reasoning about that part of data. The reasoning is done in the terms of relationship to other pieces and is more local. In the contrast, the tier 2 langauges had a global namespaces, and we are unable to clearly understand what piece of code touches, we need to inspect the entire code to find all usages of any vairable.

And just to easy up the discussion, there is some video about python loops.

types, pointers, and addresses

Given that much of my experience with pointers involves 'void*' callbacks (or often intptr_t after C99), I'm not a strong believer that types are an essential aspect of pointers. Nor even for object references. Encountered a few too many downcasts from 'Object' using Java or C#, and worked with many dynamic languages. Congruently, the wikipedia article on pointers does not emphasize use of types.

Some languages support a compacting GC: a system can change the address of referenced objects. But these languages usually favor the word 'reference' instead of 'pointer' in their documentation, perhaps to avoid implicit association with pointer arithmetic.

In any case, for purpose of this discussion, I believe that the important aspect is a semantic separation of reference from referent. The subtle, cultural distinction of pointer, address, reference is not relevant here; even mutable records or cons cells are included under the broad heading of references or pointers. Meanwhile, immutable values or linear objects would not have this separation. Thus, for example, use of `dup` on a stack value or a linear stack object should not be considered a pointer or reference, even if it's implemented in machine code by pointers.

pointers to void

Pointers to void appeared quite late in C evolution. Before they were introduced, pointers to char were used for this purpose. I have not found an exact reference when it is appeared. K&R did not have them. Some ANSI C standards started to have them.

The pointers to void are C way to impelment existential types as a pattern. So when callback and void pointer are passed to some method, the callback casts void pointer back to real type. This partern is implemented lambda abstractions or interfaces in the tier 4 languages.

Other place where they are used is work with memory, where they could be also considered as existential type pattern, as malloc function allocates memory region of the unknown type and the type is assigned by caller.

Upcasts and downcasts in Java and C# are just form of pack and unpack operations on existential types.

The dynamically typed tier 3 languages like Prolog used references and GC rather than pointers, and some dialects of Prolog were typed (like Turbo Prolog). I do not know any tier 3 language (with tier 3 code and data structures) with pointers and pointer arithmetic, where pointers were untyped.

I do not know any tier 3

I do not know any tier 3 language (with tier 3 code and data structures) with pointers and pointer arithmetic, where pointers were untyped.

Forth isn't a language so much as an entire family. But there are plenty of Forths extended with heaps and pointers, yet without types. Any of those would be Tier 3 at least.

Forth with the tier 3 structures and code?

Forth isn't a language so much as an entire family. But there are plenty of Forths extended with heaps and pointers, yet without types. Any of those would be Tier 3 at least.

Any specific dialect with documentation in mind? I do not remember data structures in the forth dialect that I have used (only simple types). The addon package like that one are just a way to define global offset constants. The names are not local to the structure. From the doc:

Naming convention

The field names that come to (my) mind are often quite generic, and,
if used, would cause frequent name clashes. E.g., many structures
probably contain a counter field. The structure names
that come to (my) mind are often also the logical choice for the names
of words that create such a structure.

Therefore, I have adopted the following naming conventions:

  • The names of fields are of the form
    struct-field, where
    struct is the basic name of the structure, and
    field is the basic name of the field. You can
    think about field words as converting converts the (address of the)
    structure into the (address of the) field.

  • The names of structures are of the form
    struct%, where
    struct is the basic name of the structure.

This naming convention does not work that well for fields of extended
structures; e.g., the integer list structure has a field
intlist-int, but has list-next, not
intlist-next.

So, I would not call such strctures as tier 3 support in the language for data structures. This is just a way to define global constants rather than to define a data strcture with name local to that data structure. Addresses are treated as integers here. This address arithmetics rather than pointer arithmetic.

On code dimention I would say that the forth is a tier 2 language for stack machine. The loop constructs and conditionals are just clever stack manipulation operations at compile time, some compilers check balancing, but they do not have to. Loops, conditionals, etc - are macros rather than language constructs. There is no inherent decomposition of behaviour into subbehavior in the langauge.

Possibly some Forth dialects get past these problem, but we need to discuss a specific dialect. Languages develop while keeping names sometimes: FORTRAN 66 is a tier 2 language, FORTRAN 2003 is a tier 4 language.

re name locality and Forth

The names are not local to the structure.

Only at tier 4 - where we might aim to program against a contract that a structure or module has particular fields (cf. row-polymorphism or duck typing) will locality of field names become essential.

In many tier 3 languages, names resolve to global symbols. E.g. `x.foo` might resolve to `x.(structtype::foo)` and `xyzzy` might resolve to `modulename::xyzzy` or similar. Thus, in reality, all names are global but we might provide a convenient shorthand. In these cases, requiring distinct field names per structure such as `list-next` does not affect expressiveness or abstraction within the language. It only affects concision.

For a tier 3 designation, rather than concision, the more relevant concern is support for hierarchical behavior and data. Without some form of struct extension, it would be rather difficult for a Forth to support trees, lists, tables, etc..

On code dimention I would say that the forth is a tier 2 language for stack machine. (... loops as macros, etc. ...)

To me, it seems an unreasonable and counter-intuitive conclusion that a greater ability to abstract over loops and conditionals - to an extent that we can avoid concrete built-ins - would somehow reduce the abstraction tier of a language.

no inherent decomposition of behaviour into subbehavior in the langauge.

Other than decomposing words into smaller words?

Addresses are treated as integers here. This address arithmetics rather than pointer arithmetic.

Whether you choose to call them addresses or pointers is still irrelevant to this discussion, IMO. If your argument is a trivial reduction to "it isn't a pointer by MY definition of the word, because it's untyped" then I'm really not interested in hearing it.

Language vs Internal DSL

To me, it seems an unreasonable and counter-intuitive conclusion that a greater ability to abstract over loops and conditionals, instead of providing a few concrete built-ins, would reduce the abstraction tier of a language.

Loops are library rather than language, and we are discussing the language here. The execution model is sequential execution of actions (with two modes compile and runtime) where each action is independent unit from point of view of runtime. The tier 3 constructs are internal DSL rather than part of the langauge. Structure support is also internal DSL as well, rather than language feature.

For example, with macros, it is possible to create internal DSL for object-oriented programming in C (there is actually a book on it), and some systems like CORBA provided external DSL to support object-oriented programming in C. But existence of such tools does not make C an object-oriented langauge. Situation with Forth and structured programming is similar to situation of C and object-oriented programming.

The ability of known to me dialects of Forth to support hierarchical structures is the same as for Assembly language. Registers in assembler could hold addresses and they could also access memory regions using globaly specified offsets. Everything is compiled to the tier 2 representation eventually: flat machine code with global labels used for linking. So assembler is able to express constructs of any tier as DSL.

The point is that we cannot reason about them basing on the code. The abstractions provided by the language do not support any reasoning about structures. If push address of struct A with field a, we could than legaly invoke B_b on the address. The langauge or runtime will not provide us any hints that such access is invalid. From the point of the language we are just extracting a word relative to the address using flat memory model, A_a and B_b are just functions from integer to integer.

In contrast to it, Prolog terms will not allow to extract non-exiting term argument using pattern matching. JavaScript will return undefinied for unknown fields. In C stuructures, it is not possible to refer to non-existing fields as well. C arrays/pointers also address relatively to array location by index (there is a known design problem that pointer arithmetic and array access are unchecked in the name of performance, but some langauges support checked pointer arithmetics).

re macros

You say that C is not object-oriented, even though we could use macros to support OOP in C, per Schreiner's book.

It's a bit difficult to judge this statement (what does "object-oriented" even mean?). But do I posit that the reason that OO C isn't a popularly acknowledged thing may be more complicated than "because macros". It could be weakness of the tooling (development overheads, performance overheads, type support, debug messages, etc.), lack of effective distribution sources for the tooling, lack of advertising, or even a lack of interest and adoption by the target audience (OO doesn't solve a real problem, don't want to pollute existing code with new styles, those interested in OO already using C++, etc.).

You are projecting a particular interpretation onto a historical event - naturally, one that favors your argument. However, due to insufficient research regarding the actual cause-effect relationships, your assumption about the cause seems like hand-waving to me. Where's the post-mortem?

Anyhow, I reject your position that macros are inherently unqualified to contribute to the abstractive power of a language. If macros enable us to abstract 'patterns' like loops, conditionals, error handling, structure types, OOP, etc. into libraries, that obviously results in a higher level of abstraction than a language that limits us to a few built-in loop and conditional patterns.

The abstractions provided by the language do not support any reasoning about structures. If push address of struct A with field a, we could than legaly invoke B_b on the address. (...) In C stuructures, it is not possible to refer to non-existing fields as well.

Abstraction tier is just about abstraction. Why are you bringing safety into it? Forth developers laugh at safety. And C certainly allows us to pass around `void*` and cast to `B*` to "legally" use fields of the latter, even though it maybe should have been an `A*`.

Library vs Language

While the macros could be considered as part of the language, the notations created using macros are not the part of the language. They are part of the library.

I do not think that mixing language and library aspects is apprpriate when we discuss abstractions supported by the language. And only directly supported abstractions are interesting in the context of language evaluation. C does support object abstraction directly with exception of pointer to void that is extremely limited escape-hatch expression of black box concept. C macros do not support object abstraction directly. But there is an internal DSLs that support recording of object notation, but this DSL is part of library.

Forth also does not support structured programming directly, it supports it only as a library in versions of Forth that I have seen.

re Language vs Library

I agree that libraries aren't language.

However, abstraction tier of a language isn't about having a large set of built-in patterns. If a language has 40000 built-in patterns yet doesn't allow defining new ones, it's still only tier 1 - every program a specific, flat object.

Instead, tier 3 is essentially about our ability to extract patterns into libraries - named patterns hierarchically using named patterns.

Thus, it seems to me that you are in error when you argue that Forth is only tier 2 for abstraction because it's lacking some built-in patterns. Forth's ability to abstract such patterns into a library is strong evidence for a tier 3 designation. In contrast, C with its built-in support for a few structured programming patterns but inability to effectively abstract new loop and conditional patterns is arguably weaker at abstraction than Forth.

Re: Language vs. Library

I think we could just agree to disagree on language vs. library issue.

However, abstraction tier of a language isn't about having a large set of built-in patterns. If a language has 40000 built-in patterns yet doesn't allow defining new ones, it's still only tier 1 - every program a specific, flat object.

I'm not very big fun of fixed languages, and I even have own pet project create extensible C-family syntax. But I still think that language abstractions should be evaluated on what the language definition bring to the table, not by what can be created from it.

If we consider the high-tier languages with extensible syntax, there are two other well-known examples that support the tier 3 abstractions via library: it is Schema and Smalltalk. Both are tier 4 languages.

Schema in contrast to Forth is a true tier 4 language. It has native hierarchy on code structure organized with parenthesizes (so it naturally includes tier 3 too). Macros depend on hierarchical structure of the code and even provide hygienic macro support that depends on it. On data side we have primitives (tier 1), vectors and pairs (tier 2), references to other vectors and pairs (tier 3), and references to lambdas (tier 4).

In Smalltalk, all control flow constructs are implemented using blocks (which are basically lambdas).

In contrast to Forth, control flow in these languages uses tier3/tier4 structures provided by the language instead of building own control flow structure above language. So, Schema and Smalltalk extensions stay in the same logical level as base language, the Forth creates additional logical level above base language. I think it is a significant difference.

Also, Forth at base have only three global data structures: data stack (modified globally), return stack, and global flat memory. On data stack words are stored, on return stack code addresses are stored. And flat memory has get/set operations by address and assumed type. So, at the language level, there is no tier 3 structures, there is only three global flat structures. The dialect of Forth that I've worked with even did not had all these structured programming macros. I just had BRANCH and ?BRANCH and worked out some problem specific macros with them.

biased against minimalist languages?

I disagree with your position that all libraries should be excluded from your evaluation. Not suggesting you should include all libraries, either - that'd be absurd. Rather, when a language has defined or de-facto standard libraries observed in widespread use and common knowledge among its programmers, those should be included. Otherwise you have biased your evaluation of languages - especially extreme minimalist languages like Forth - in a way that is simply incongruent with common practice.

In the end, is it not the common programming experience that matters?

And the reason C isn't OOP is not "because macros" or "because libraries". Ultimately, it's "because most actual C programmers don't do that, have not encountered it, and really aren't all that interested".

Evaluation of libraries and languages

I think libraries should be excluded from evaluation of the language, but you could evaluate library itself. But this would not be "the language X provides tier 3 abstractions". It would be "the library Y for language X provides tier 3 abstractions". IMHO these phrases are completely different.

Minimalist languages based on low-tier abstractions has advantages of simple implementation. And their applications might be reasonable for very small applications. But their minimalism hurts when the behavior complexity grows, because of lack of built-in abstractions. Libraries help up to certain point and they cannot completely compensate lack of abstractions in basis.

This is not good or bad. This is a tradeoff. Forth might be good for embedded scripting, but it is not so good for writing online shops because of lack of suitable abstractions. On other hand, cost of abstractions of Ruby, Java, or C# makes situation a reverse: good for web sites, bad for small-scale embedded scripting.

re libraries help to a point

Libraries help up to certain point and they cannot completely compensate lack of abstractions in basis.

This is a rather strong claim. Why do you believe it true?

I grant that a system's foundation can restrict its growth. Can't build much on quicksand or brainfuck. Forth is rather awkward around concurrency. But, ultimately, we build entire skyscrapers using the lowest level atoms. I am not aware of any fundamental limit to building large programs and high tier abstractions upon a minimalist foundation. And, indeed, this is my own goal as a language designer.

Core abstractions are important

I could only wish you good luck with that. It will be great if you will be successful. I have not seen major successes with such approach, so I'm not so optimistic.

For example, the experience with Common Lisp/Scheme is not so optimistic. There are basic tier 4 data structures and rich macro systems, but the problem is tier 3/tier 4 basic data structures in the language are somewhat limited. There are numerous efforts to build struct/classes above them, but all variants that I have seen were less usable than traditional languages like Java or C# on many aspects. It is possible to create many interesting things using CLOS, but if we consider getting things done for typical applications - things are different.

Simple things in CLOS are more complex than in Java/C#, and the typical application consists mostly from simple things. This involves both surface syntax (that is too uniform in CLOS, so brain has to use more complex pattern matching) and concepts. There was no lack of propaganda for LISP/Scheme, so obscurity was the reason of failure to get traction.

Structured programming in Forth has a similar problem, the human pattern matching required from user is even more complex as there even less structures in the code to guide it.

While popularity of the languages involves many causes, the perceived productivity is a major factor. Being flexible is good for language, but not when it causes the brain of the application developer to flex too much as well. I think studying more successful languages and comparing them to less successful (in the case when people could choose on their own) could be a good way to understand how humans are working with abstractions. If language still fails to get a traction with reasonable propaganda and the long time period, then there is possibly something that goes against the development flow in the language. This means that other languages help to get things done better.

Developing a stack of abstraction from simplest ones might be good thing, but how does it help to get things done? Will simple things be simpler or more complex with that?

re core abstractions

Humans are biased toward concrete visualization and manipulations. Undifferentiated syntax doesn't play to our strengths.

However, within my goals, only the actual distribution-layer for code and data needs be minimalist. Projectional editing could allow specializing or extending notations where appropriate.

If language still fails to get a traction with reasonable propaganda and the long time period, then there is possibly something that goes against the development flow in the language.

Could be any number of things: awkward syntax, weak performance, poor debug experience, licensing issues and wariness towards vendor lock-in, too easy to adequately simulate features in existing languages, or perhaps the language is a solution in search of a problem.

how does it help to get things done? Will simple things be simpler or more complex with that?

With a minimalist language, simple things tend to become more complex, requiring libraries or projections for HCI, software accelerators for performance. The benefit is that we can truly understand the consequences and feature interactions of the language foundation. Thus, we can design towards nice systems-level properties, avoid conventional sources of accidental complexity, and arrange for conventionally complex problems to be simpler.

Internal DSLs and abstraction tiers

Could be any number of things: awkward syntax, weak performance, poor debug experience, licensing issues and wariness towards vendor lock-in, too easy to adequately simulate features in existing languages, or perhaps the language is a solution in search of a problem.

  • "awkward syntax" is exactly example of "something that goes against the development flow in the language"
  • "weak performance" is not a reason for failure, since a lot of languages with poor performance got popular (Ruby, Python, PHP, etc.)
  • "licensing issues and wariness towards vendor lock-in" - the same, there are open source implementations of LISP and Scheme. And there are a lot of them. Emacs is was built with LISP since ancient times, but it did not caused others to copy architecture.
  • "poor debug experience" - almost every new language with own runtime has this problem, notable exceptions are JVM and CLI languages that could use language-agnostic debug infrastructure of the runtime.
  • "too easy to adequately simulate features in existing languages" - this only means that these features are not unique benefits of the language.

So, LISP's failure to get traction could not be attributed to most of above reasons. I think the primary reasons are features of the language itself. While it is good at solving some intricate problems and DSL construction, it is somewhat worse at basic programmer tasks. Saving time by constructing DSL could only help up to some point.

With a minimalist language, simple things tend to become more complex, requiring libraries or projections for HCI, software accelerators for performance. The benefit is that we can truly understand the consequences and feature interactions of the language foundation. Thus, we can design towards nice systems-level properties, avoid conventional sources of accidental complexity, and arrange for conventionally complex problems to be simpler.

Note, you list direct benefits for DSL designer, not for the language user. The language users will be ones who spend most of time with language, so the language design should be optimized for them. You might be repeating possible LISP's mistakes, they have also optimized for compiler writing and DSL design. Even user create new abstractions, the user will spend more time using them than creating them.

The language user wants abstractions to interact smoothly and predictably, he/she does not want to understand and care about them deeply as the language user attention is busy by the task at hand. And more minimalistic the foundation is, the more chances for unexpected and undesirable interactions. For example, nesting errors in Forth for structured programming DSL could lead to quite unexpected failures, unless special care is taken in macros.

When we are staying at the same or lower logical layer as foundation language, we could rely on foundation language to do needed checks for abstraction interaction. When we build something above the foundation tier (using internal DSLs), we are on our own and usually get a lot of unexpected failures and we need tool support with multiple heuristics.

For modern example, dependency injection frameworks (Spring Framework, Dagger, and others) are implementation of the tier 5 concept, and their support as internal DSLs in the tier 4 languages is very poor (C#, Java). Getting unexpected runtime errors is very common. The developer has to keep in the mind a lot of implicit connections. And because there is no the tier 5 abstractions in the language, it is hard to use Spring and Dagger in the same application. The abstraction do not interact. In .NET tried to solve this problem at library level, but some DI framework developers find the solution to be worse than the problem itself.

re internal DSLs and abstraction tiers

I believe you're far too eager to attribute market failure to your pet theories without sufficient research and analysis. It's just handwaving.

And yes, the direct benefits of design minimalism are for the designer. But so are the direct benefits of a feature-rich language design - the ability to just casually add features as needed. Either way, there's never a guarantee of a nice language for the end user on the basis of language design philosophy.

language user attention is busy by the task at hand

That's my observation, too. It's why I prioritize "nice systems-level properties" - because the user is typically too busy with the trees to pay attention to the forest.

(... Forth macros unsafe ...)

Forth's design philosophy ignores the 'safety' issue, and emphasizes personal control. Where you complain that careless implementation of macros would easily lead to errors, they'd shrug and tell you to debug the macro.

You could easily make a case that Forth macros have a high risk of unexpected and undesirable interactions due to the design of the macro system. But attributing this to the minimalist foundation and attempting to generalize is just nonsense.

When we build something above the foundation tier (using internal DSLs), we are on our own

There are quite a few approaches to internal DSLs - e.g. monadic DSLs, ASTs as data types, syntactic sugar, projections, type-specific language per Aldrich's Wyvern, macros, fexprs, adaptive and extensible grammars, etc.. Several of these approaches operate within a language's 'normal' logic. I prefer such approaches because they interact nicely with safety, modularity, and further abstraction.

In any case, it would be incorrect to assume that mechanisms to extend notation are generally or inherently unsafe or unpredictable. You can only judge specific mechanisms of extension.

DSL extension direction

Forth's design philosophy ignores the 'safety' issue, and emphasizes personal control. Where you complain that careless implementation of macros would easily lead to errors, they'd shrug and tell you to debug the macro.

I complain if as user of the macro I'll do error in if/do nesting (for example, start if inside loop and terminate it after loop), I'll get unpredictable error, or no compile/runtime error at all depending on macro implementation, the code will just behave incorrectly. This situation could be reached with careless copy paste for example.

There are quite a few approaches to internal DSLs - e.g. monadic DSLs, ASTs as data types, syntactic sugar, projections, type-specific language per Aldrich's Wyvern, macros, fexprs, adaptive and extensible grammars, etc.. Several of these approaches operate within a language's 'normal' logic. I prefer such approaches because they interact nicely with safety, modularity, and further abstraction.

Approaches operate within a language's 'normal' logic are exactly what I mentioned as DSLs with abstractions of the same or lower-tier comparing to the base language. I just do not see how we could provide abstraction of higher-tier cleanly. The all attempts that I have seen were plagued by problems: structured programming in Forth, object-oriented programming in C, or system-oriented programming in Java.

In any case, it would be incorrect to assume that mechanisms to extend notation are generally or inherently unsafe or unpredictable. You can only judge specific mechanisms of extension.

The primary problem that I see is not a specific mechanism, but a direction of extension. It is easy to create safe extensions when we introduce abstractions of the same or lower abstraction tier since such extensions could rely on basic language mechanisms. But if we try to create abstractions of higher-tier than the base language as internal DSL, this way will be likely plagued by usability and/or compatibility problems.

re DSL extension direction

I just do not see how we could provide abstraction of higher-tier cleanly. The all attempts that I have seen were plagued by problems

Conventional language are not very good for concurrency, security, distribution, computing over heterogeneous systems, multi-stage programming, or extensible notation. Our conventional application models - basically `void main()` plus ad-hoc effects - are similarly weak for extension, composition, extraction, process control, security, distribution, and so on.

I would describe this as "plagued by problems" even before we contemplate a tower of abstractions. It should not surprise anyone who studies languages that our attempts to build atop such shaky foundations tend to crumble.

Where you've erred is assuming that evolving our set of built-in abstractions from lower-tier to higher-tier, e.g. from C to C++, is the best or only direction we can go. It's not a wrong direction, and it's been 'successful' for some measures of success, and it's immediately useful. But working towards a superior foundation that we can more easily build upon is also not wrong. And you can see a lot of that in the more academic circles or purely functional programming communiyies.

Naturally, you won't find people working on 'low abstraction tier' foundations that intentionally elide support for black boxes just to prove a point of dubious utility to some guy arguing on the internet.

But minimalist foundations like a combinatory logic, free monads, a simple rewrite system, or a temporal logic, can be very convenient targets when the goal is a solid foundation.

I complain if as user of the macro I'll do error in if/do nesting (...) the code will just behave incorrectly

This is true for erroneous use of any word in Forth. Forth has no safety nets. It's fine that you favor safety and complain if your language makes common errors difficult to detect. But it becomes a double-standard if you complain only when macros are involved.

In any case, if we augment the Forth development environment with linters or static analysis that can catch common typographical and structural errors, it wouldn't change the abstractions or notations involved. Thus, I ultimately feel that this entire digression about 'safety' is irrelevant. You're trying to make a point of a property that is simply outside the scope of 'abstraction'.

Engineering vs Math

From what I see, we are looking at the programing languages from different points of view.

I have an impression that you are looking at programming languages from point of view of mathematics. Like a mathematician you are trying to build complex things from simple things. I’m looking from point of view of engineering, I consider languages as products to be used by humans, so I care more about usability and other things than about mathematical foundations. So, my arguments from engineering standpoint do not reach you as you are considering them as irrelevant, and your arguments from pure math standpoint look incomplete to me as you seem to be ignoring humans. Much of our discussion could be reduced to the famous saying:

In theory, there is no difference between theory and practice. But, in practice, there is.

In my view, abstractions are created by humans and for humans. This is just a tool organize complex behavior of the applications. Computer does not need these abstractions. There is natural path how humans create and adopt abstractions. This path could have mathematical structures underneath and could be described in mathematical terms, but ultimately, it is a feature of human neural network rather than mathematical abstraction. The mathematical model of that path is a formal description of that human path rather a primary source of truth on the matter. So, I’m trying argue via human behavior, you are trying to counterargue via mathematics. Our arguments are just about different things.

Also, mathematics and engineering have opposite view of humans with respect of the reasoning process. Mathematics tries to abstract out human out of reasoning process because humans are unreliable and prone to errors. Engineering has to acknowledge humans in reasoning process because humans are unreliable and prone to errors and engineering artifacts have to be human-compatible.

As for programming languages, we could see natural selection in the form of generate and test. A lot of programming languages are being created. Some stick around and some go into obscurity. From my point of view the current result of this process is significant. Because we could use this result to get insights into human-computer interaction. From engineering point of view, the popular languages have some HCI properties which made them more popular. So, they are more human-compatible in general. The obscure languages have some HCI properties that drove them into obscurity or niche usage. So, they are less human-compatible in general. The single language might be a not so good indicator as too many accidental things might have happened, but group of the languages with a similar feature set is a better indicator. For example, we could see the following results:

  • Generic LISP-like syntax has lost to specific syntax with a lot of keywords. This does not indicate that the idea of extensible syntax is lost as well, but it does indicate that it should not be done in the LISP way. For example, XML language extensibility architecture is quite alive. The specific implementation of the architecture is not suitable for programming languages, but structure of the architecture might be.
  • Functional programming lost to object-oriented programming and useful parts of functional programming were assimilated by object-oriented languages: lambdas are assimilated directly, immutable structures as a design pattern w/o extensive language support. There still are obscure islands of resistance in the form of functional object-oriented programming, but I think they will be further forgotten on the next paradigm shift.

The reason for situations above could be discussable. But still, these are important facts about languages and humans. There is possibly some mathematical explanation of why this happening, but the fact is mostly about humans rather than about mathematics of the languages.

Building a programming language from basic pieces and introducing more and more complex abstractions looks very good from mathematical point of view. This is actually the way mathematics works. Because mathematics care about conclusions and the way the conclusion is reached.

From engineering standpoint, one should care about abstractions directly used by humans (so we work with conclusions only). For humans, it is not important how abstractions are reached at the point of time when they are working with them. They even do not have to be logically consistent or complete, because it is possible work around corner cases with hacky features. The have to be usable and smooth on majority of use cases, some obscure use cases might require ugly code. For example, array typing in Java is quite horrible from mathematical point of view, but it works for majority of use cases, the rest we get runtime errors. Engineering about balancing tradeoffs rather about reaching perfection in every aspect.

Also mixing in the same system abstractions from different tiers might look good from mathematical point of view, but in the practical engineering it is not so. Abstractions from lower tiers sometimes have to be sanitized to work well with abstractions of higher tiers. For example, “go to” was sanitized to “break” and “continue” in the tier 3 languages, additional semantic checks were put on “go to” usage where it was left. So, “go to” in the tier 3 language is different from “go to” of the tier 2 language in semantics. From math standpoint, two “go to”-s might be practically the same, but from engineering standpoint the tier 3 “go to” is different as its’ compilation has to be context-aware, while the tier 2 “go to” compilation is context-free.

As another example, data structures were sanitized in the tier 4 languages as well. Most of the tier 4 languages do not allow to inspect memory layout of the object via normal language means and most are using garbage collection. C++ has not done it, and it is very easy to make mistakes related to memory in C++.

I doubt that it is possible to create a usable language without such sanitation process when introducing higher-tier abstraction. This is not a math problem, it is usability problem. But if do sanitation process, we are practically creating a new language layer, so the previous language layer is hidden.

faith in the market

I take multiple perspectives on language.

What I don't do is conflate all the various concepts (abstraction, safety, performance, convenience of notation, path dependence, evolution, history, marketing, etc.) into one jumbled mess in my head to the point that my arguments on the subject of "abstraction tiers are just about abstractions" become incoherent hand-waving towards topics that are very obviously not about abstractions.

We could certainly discuss such factors. But it would be illogical to do so as part of an argument about, for example, whether pointers are required for tier 3 abstractions. Context matters.

Anyhow, it is my impression that you have too much faith in the market to pick winners based on nice HCI, abstractions, and other technical merits. Whereas, I see quite a few influences on language success that are more explainable in terms of path-dependence and metastability - momentum, integration, tooling, history, sunk costs, training, human resources, etc.. HCI is at best one weak, indirect factor among many.

Basically, I think your entire argument regarding popularity and HCI is only weakly related to actual history.

You could go study 'why Lisp failed' and you'll find dozens of hypotheses that the writer believes is reasonable, and yet seem clearly based more upon agendas rather than evidence. Feel free to add your pet theory to the list.

People, not Market

What I don't do is conflate all the various concepts (abstraction, safety, performance, convenience of notation, path dependence, evolution, history, marketing, etc.) into one jumbled mess in my head to the point that my arguments on the subject of "abstraction tiers are just about abstractions" become incoherent hand-waving towards topics that are very obviously not about abstractions.

The language is evolving system. To consider it, one has to consider both human/culture side of it, language itself, and hardware. You want to consider things in isolation. It works up to certain point. I'm trying to consider the entire system and system factors that affect the languages and abstractions. Considering languages and abstractions without human factor is just useless. Languages and abstractions are only useful for humans.

For example, in your SLRP is just hand-waving from my point of view, because you have not demonstrated system connections. There is an infinite number of possible mathematical models of computing that Turing-complete. So, good general-purpose architecture needs to have a wide class of task solvable with it (almost all practical tasks + some new ones) and it has to be efferently implementable in hardware. For your architecture, you have provided neither. You have not demonstrated that it is more convenient and efficient to solve classical tasks in it like writing website or modelling nuclear blast. You also dodged the question about hardware by specifying that it is possible (the real question is whether it is more cost effective for tasks than current hardware, programming languages are supposed to be targeting this architecture would feel better on von Neumann hardware). There was just some hand-waving about “security, concurrency, distribution, and scalability” without any demonstrations. So, from my point of view this is a typical toy architecture that is completely unrelated to practice at the current moment of time, and it will possibly never be.

You have written a big text on SLRP, but you did not answer a single question I cared about: why people would like to write in the languages created with such architecture in mind? This question has a lot of sub-questions. Would they like to write a simple web site in it, or would they find even PHP to be more productive? People are clearly capable to use the language stack based on von Neumann architecture, will they be able to solve their tasks better basing on your architecture? Would it be easy to learn these languages and what is the learning curve?

I could guess the only one class of tasks that possibly match architecture somehow: complex event processing. This is very narrow domain and if the language is working well only for it, it could not be really called general-purpose whether it is Turing-complete or not. This narrow niche hardly needs own hardware architecture, as it is working well enough on von Neumann hardware.

Anyhow, it is my impression that you have too much faith in the market to pick winners based on nice HCI, abstractions, and other technical merits. Whereas, I see quite a few influences on language success that are more explainable in terms of path-dependence and metastability - momentum, integration, tooling, history, sunk costs, training, human resources, etc.. HCI is at best one weak, indirect factor among many.

It is not "market" that select languages. People select languages. People select languages that they think are best for the task at the hand. This include a lot of factors, but for simple short application it is just like/dislike of single developer. And like/dislike is an overall HCI evaluation of technology from the human point of view. Considering current OSS culture, any technology could take up quite quickly if it proves its merits on short projects, and it is pushed to larger projects if people liked it.

It is the case Java took its place in mainstream not because it had a lot of libraries. It is Java got a lot of libraries because a lot people like to write in it, and then more people jumped in because there are libraries. I personally liked Java very much after the switch from C++. Java 1.0 beta was slow and buggy, it lacked tooling, there were no GUI debuggers, there was vendor lock-in, there were no third-party libraries, but it was much better for humans than C++ or Object Pascal. It took over like a storm. I’ve personally pushed Java into projects that could afford Java’s disadvantages in company where I worked as developer, and since there were successes and developers liked the language, managers started to push Java after it.

OSS technologies found their way into people like despite of initial lack of support from big players: php, ruby, python, lua, etc. There are a lot of examples of grassroot languages to achieve success. It could be argued that people picked these technologies because these technologies were easier to learn due to similarity to other languages. But this is part HCI as well, since people are not born with programming knowledge in mind. The technology has to have a good learning path.

On other hand Scheme did not caught up despite of its push via academia and its use as teaching language. Other functional languages like ML or Haskell also failed to catch up despite of their push via academia. Me and a lot of people I know tried the languages, and these languages did not resonate with tasks we had. They force own way of doing things rather than helping to solve the task. And this is an HCI problem too, instead of solving task I had to do some unrelated rituals in these languages.

So, your argument about “path-dependence and metastability” vs. HCI does not hold water well. “Path-dependence” is HCI factor, since nothing forbids functional languages to provide own learning path if there is one. If there is no easy-to-follow path, this is an HCI issue. “Metastability” is not issue at all right now, since tooling, libraries, etc. are proven not to be unsurpassable barrier on many examples.

Basically, I think your entire argument regarding popularity and HCI is only weakly related to actual history.

You could go study 'why Lisp failed' and you'll find dozens of hypotheses that the writer believes is reasonable, and yet seem clearly based more upon agendas rather than evidence. Feel free to add your pet theory to the list.

I more believe in people's choice than “agendas”. “Agendas” could force their way up to the point. The times when one big company could force the choice of the programming language for the entire industry are long passed. There are a lot of architects that like to experiment and adopt new technologies. If the choice is successful, it is replicated by architect itself and by his/her colleagues in future projects. This is how evolution of the programming practice happens. While there are a lot of random factors in such evolution, eventually it selects what is working well enough for solving tasks. This is a generate and test process. People are not enemy of themselves. If you want some better language to propagate, create it, use it, and advertise results. Some will try to mimic success even despite of learning barriers, if there is really a success to mimic.

I'm trying to consider the

I'm trying to consider the entire system and system factors that affect the languages and abstractions

Sure. But not only are your attempts mostly BS due to lack of evidence, but doing so is just way off topic, and therefore irrelevant to the discussion. Might as well consider how the price of tea, or quality of air, or the future extinction of insects and collapse of the ecosystem affects abstraction. I'm sure you can fabricate more answers.

It is not "market" that select languages.

It does, in the sense the market selects anything. Can't really discuss Java's or JavaScript's rise without looking at the market forces involved.

People select languages that they think are best for the task at the hand.

Based on what evidence do you say this?

If it's between a language that's best at a task vs a merely adequate language that I know well and have installed, I know which one I usually choose.

“Path-dependence” is HCI factor, since nothing forbids functional languages to provide own learning path if there is one

Wat? Path dependence isn't about learning. Though existing knowledge does contribute to it.

People are not enemy of themselves.

Is that so? Then why do they keep shooting themselves in the feet?

Market forces are limited

But not only are your attempts mostly BS due to lack of evidence

I could only return it to you. You do not provide evidence either, and when you provide, you sometimes provide unverified evidence: for example, VHDL which you claimed to lack pointers. But I guess it useless argue about it with you. You are trying to consider abstractions abstracted from humans, I'm more interested why humans are getting them and their connections to humans. And this is essentially connected to usability, popularity, and other issues.

It does, in the sense the market selects anything. Can't really discuss Java's or JavaScript's rise without looking at the market forces involved.

For JavaScript, market forces had significant role, as browser vendors controlled the programming language market. For Java, there were no such control on the server-side and fat clients at early stages. Java gained control much later, because it offered what people needed. People were relatively free to choose suitable server-side technology, and they chose alternatives quite often. Some people even have chosen LISP or Haskell.

People select languages that they think are best for the task at the hand.

Based on what evidence do you say this?

If it's between a language that's best at a task vs a merely adequate language that I know well and have installed, I know which one I usually choose.

I implied "are best for them for the task at the hand.", but I guess this implication did not get across.

Being "best for them" includes evaluation of many factors: learning cost, usage cost, developer's cost, future investments, interest etc. Many people I know try something new for run-once small tasks or pet projects just because it is new and promising and it would make otherwise boring task more involving. If the language does not provide a way to do simple tasks in simple way, why would anyone bother learning it?

Is that so? Then why do they keep shooting themselves in the feet?

For humans, it is possible to shoot themselves in the feet using any technology, there are just different ways to do that. In Haskell, it is possible to do so using lazy evaluation and infinite structures, for example. That is if one will get past compilation errors.

For any abstraction tier, humans are able to use it to organize behavior complexity only up to certain point, after which they start shooting themselves into the feet casually, using new tools. Human mind is limited and specific abstraction could help it to handle complexity up to the point.

re market forces

Market forces - including investment, incumbence, job opportunities, and mind share - are vastly more significant than you make them out to be.

For every successful PL, there's likely over a hundred you've never heard of - many very nice. Most die in the PL cradle. Not usually because they lack merit or good notation (if PHP can succeed, that bar isn't terribly high). But rather, because they lack advertising, investment into tooling or performance, libraries, documentation, or because the incumbent PL for the niche is adequate and people need to maintain code and don't want to deal with two languages, etc..

Market forces kill over 90% of PLs that could have been contenders before they ever get a chance. Success has a fair level of chance involved unless a large company backs the language.

Market forces are strong due to path dependence. Past language choices influence future language choices, due to learning, maintenance, integration. Language popularity influences investment into libraries, performance, debuggers, etc.. Investment makes it difficult for a new language to join in.

VHDL which you claimed to lack pointers.

This was a misunderstanding on your part. I was countering a claim that tier 3 abstraction *need* pointers. With VHDL, we can have hierarchical abstraction without pointers, and compile to hardware that also has no pointers.

Your counterpoint that VHDL *supports* pointers was and remains irrelevant. Support for pointers is simply not the same as need for pointers.

You do not provide evidence either

The core of my arguments in this discussion are based on definitions. For example, you define tier 3 abstractions in a certain way, and there is nothing in that definition that explicitly or implicitly requires pointers.

Evidence shouldn't be part of a discussion based on definitions.

Indeed, you'd need to say something unnecessary and irrelevant - such as commenting on the evolution of languages or the effect of syntax on language popularity or the reason lisp failed - before evidence is required. (Of course, if you did provide evidence, it'd *still* be irrelevant. But without evidence, it's also bullshit.)

Your interest in those factors is fine. They interest me, too. But I don't recall those factors being part of your abstraction tiers.

For Java, there were no such control

Control wasn't the only market force involved that I'm considering.

Being "best for them" includes evaluation of many factors

Indeed. And many of those factors are ultimately integrated with those market forces. Notation or technical merit for a task are relatively weak factors when placed among them.

Of course, as a realist, I also don't believe people usually do what they think is best for them. Like, I know that arguing with you on the internet has no value to you or me, yet costs my time and energy.

abstracting people into market forces

Market forces kill over 90% of PLs that could have been contenders before they ever get a chance. Success has a fair level of chance involved unless a large company backs the language.

Success involves a quality (as in "suitability for purpose"), chance, and persistence. Let's take Haskell as example. Haskell community has persistence. And it had a lot of chances since Haskell propaganda was quite strong few years ago, even on Java conferences. A lot of people that I know tried Haskell or other functional language. That propaganda still failed to bore a fruit for Haskell. Only quality is left to discuss.

Market forces are strong due to path dependence. Past language choices influence future language choices, due to learning, maintenance, integration.

Integration is now mostly non-problem. In age of REST services, if the language supports HTTP it is in a good position. The minor problem is support for databases and other EIS sources, but creating a minimal driver for open-source database like Postgres is about one man-month in a reasonable language.

Maintenance (if for the language) itself is non-issue if we consider language startup as there should be enthusiastic core. Also, older languages like Haskell have no problems to demonstrate maintenance. For run-once programs it is completely non-issue. If you mean maintainability of programs written in the language, then lack of maintainability is an issue with language. But Perl is survived for long and it only started to fade out quite recently.

Learning is the issue with the language. The functional languages feed high-tier abstractions to the language user from start. Even to write a "Hello, World!". My first shock when learning Haskell was that tutorial referenced Category Theory when I just wanted to write a simple program. This was a bad learning path and this is a problem with the language.

Indeed. And many of those factors are ultimately integrated with those market forces. Notation or technical merit for a task are relatively weak factors when placed among them.

Notation or technical merit for a task is relatively strong factor for run-once tasks and pet projects. And languages get people hooked via this channel. If the language helped to get things done on the first time, it will be likely used on the second time. Adoption path starts with small things.

And you are abstracting people into market forces and you naturally lose power to change situation. This is a bad marketing strategy. In marketing, people are separated into different categories by common concerns and the specific concerns of target categories are addressed one by one. It is impossible to understand market as whole. It is possible to understand specific people. It is possible to understand concerns of groups of people if group is focused enough.

Haskell marketing was quite good IMHO, as it promised a lot of good things that I cared about. The problem was that these good things were encumbered by multiple things that stayed between me and getting things done. So, Haskell failed to deliver to me on promises of better productivity.

I think that this marketing was one of factors that forced OO languages to finally adopt lambda syntax as shortcuts. Almost all good things from Haskell I could now get in Java now with less efforts. There is still lack of pattern matching, but it is available in Kotlin and Scala and it is coming to Java too. IMHO Scala version is even better than Haskell's one.

re abstracting people

you are abstracting people into market forces and you naturally lose power to change situation. This is a bad marketing strategy. In marketing, people are separated into different categories by common concerns

It seems you imagine "market forces" as some homogeneous thing. But I've already mentioned some of the differentiations, like use of the word 'niche'. Understanding different roles a language could take, beachhead strategies, separating people by common concerns, etc. is still a form of "abstracting people into market forces" according to how I use the phrase.

Integration is now mostly non-problem. In age of REST services, if the language supports HTTP it is in a good position

This has helped, I agree. Future web assembly should also help.

If you mean maintainability of programs written in the language

No, I mean maintenance of existing codebases. IIRC, new projects are far less than 10% of programming jobs. And even for those, it's difficult to adopt a new language because managers must consider prospects of future maintainers to work on several projects (e.g. all the old code is Java, now Fred wants to use Scala for the new module/project, but do I really want to require programmers who know both Java and Scala after Fred leaves? No, I'll tell him to use Java.)

Notation or technical merit for a task is relatively strong factor for run-once tasks and pet projects. And languages get people hooked via this channel.

Individual or one-off tasks are a viable beachhead. That's where Perl got its start, IIRC. Unfortunately, merit for pet projects and one-off tasks does not imply merit for large or long-lived projects.

Ideally, a language would be great for both. That's a goal of most language designers. But it's a rarely met ideal.

Thought: I wonder what % of people and career programmers have pet projects or program at home each year, and by age. But stats seem hard to find.

Learning is the issue with the language. The functional languages feed high-tier abstractions to the language user from start.

This is certainly an issue for some functional languages, including Haskell. Fortunately, newer approaches based on algebraic effects (free monads plus row polymorphism minus the 'monad' branding) offer a much lower barrier to entry in this respect.

Observers

You're missing an important aspect, here: There is no observer of the Turing machine head's location within the Turing machine system.

The Turing machine itself is the observer, and the tape is the thing observed. The tape is not an active participant, the tape does not 'move', the tape does not change its symbols. The tape is just a serially accessable data store.

We can define natural numbers by starting at zero and adding one, and by induction we get every natural number. Likewise with the tape we can start at the current position and by induction over increment and decrement we can define all positive and negative integers (assuming an infinite tape).

So we can assign an address to every tape location. We can now refer to the current location of the Turing machine head, which is the address the head is pointing at. Just because we can only change this pointer by incrementing and decrementing does not stop it being a pointer.

re Observers

Turing machine is well defined. It observes exactly two things in each step: a scanned symbol and a state register. Both of these are from finite alphabets. The tape is infinite.

The details of your setup are unclear to me. I assume by "we" you mean the machine. But if your machine is to literally assign a symbol representing an address to every location on the tape, or otherwise track the location of the reader head, then how do you resolve using a finite alphabet to represent infinite addresses?

Also, in general, Turing completeness gives us enough power to represent any deterministic, confined computation, even a von Neumann machine with pointers. I naturally exclude such simulations of pointers from the question of whether the machine itself is using pointers. It's a matter of scope: a property of the program vs a property of the machine.

Yet, it seems you're attempting to develop a reflective program as the foundation for an argument that the machine is observing its own location. Why do you believe this form of argument is reasonable? Is it your belief that machines should be attributed with the properties of all programs they could potentially run?

Finite alphabet infinite address

The 'proof by induction' shows that this is the same for natural numbers. With a finite alphabet (0-9) we can represent infinite numbers, as there is no limit to the length of a number (an infinite tape can store any number no matter how large and still have room for the rest of the program).

However the argument that the machine was the observer was not a reflective argument. It was simply a statement that the machine observes the position of the tape. This statement is trivially true, unless you want to invoke the "humans are special" and only a human (or living being) can act as an observer in a quantum mechanical sense? Even then if we take the multiverse interpretation of QM and have the machine observe atomic decay instead of a tape, the machine is split into multiple copies when it observes/does not observe the atom decay, and multiple machines exist in different universes with different internal states.

Philosophically one could say that we exist because of the split between observer (the internal mind state) and observed (the external universe state). If you collapse everything down so there are no separate observer and observed, time and the universe cease to exist.

In any case, you make a good argument, certainly I could imagine some kind of streaming system with no pointers, but I don't think it would be very practical without any storage. I am still not sure I would count it as general purpose without general recursion and recursive data structures.

the machine observes the

the machine observes the position of the tape. This statement is trivially true

I believe you mistakenly conflate observing a symbol at the reader head with observing position of the reader head. That, or you aren't carefully using 'observe' in the sense of observable properties.

I could imagine some kind of streaming system with no pointers, but I don't think it would be very practical without any storage. I am still not sure I would count it as general purpose without general recursion and recursive data structures.

The SLRP stream can be stored, e.g. to tape, it's just outside control of the process (modulo effects). SLRP supports general recursion - the article describes anonymous recursion via strict fixpoint combinator. SLRP also supports recursive data structures via Mogensen-Scott encodings (look it up), as briefly mentioned in the article.

Why "imagine" a straw-man streaming system to disparage when you could refer to SLRP, which I've argued to be general purpose? I mean, besides the typical rhetorical reasons for fabricating straw-men.

Imagine De Bruijn

Why "imagine" a straw-man streaming system

I Imagine because I do not have a concrete working example on my table. A description of the language it uses is not an implementation, otherwise what do Intel and AMD spend millions (billions?) of dollars designing? an x86 is not the x86 instruction set, its easy to create an instruction set for an imaginary processor, its much harder to produce an implementation. If you produced an RTL level VHDL (or Verilog) description of the processor then I would not have to imagine, I could step through in a simulator, or burn into an FPGA and use it.

The SLRP stream can be stored, e.g. to tape, it's just outside control of the process (modulo effects)

This is the typical "get out of gaol free" card that people play with functional programming. Simply declare the state outside of the scope of interest and it magically disappears. I do not accept this as a valid argument. If your system can store the program you _must_ include the design and implementation of the storage inside the system.

SLRP also supports recursive data structures via Mogensen-Scott encodings (look it up)

I am familiar with church encodings, but I had not seen MSE before. Whilst its a pretty neat party trick, its moving further from what is actually implementable or usable. For a start it relies on having a arbitrarily large lambda term, which would therefore not fit in any kind of 'register', it is also very inefficient.

I don't think I could actually write any kind of large program using MSE, and I don't think it would make for an efficient implementation. It seems interesting maybe as an intermediate language, or an academic curiosity, but maybe I need to look at it a bit more.

It is clear there are two models of computation, the mathematical model based on lambda calculus, church encodings etc; and the practical based on the von Neumann architecture.

I am happy to refine my original statement to: Pointers are fundamental to efficient and practical computing.

Also whilst the two models are of equivalent computational power, the question remains are they on the same level of abstraction. I think there is a good argument that pointers are still 'fundamental' to the von-neumann model - which is in some sense isomorphic to lambda calculus. I don't see any argument that lambda calculus is somehow 'more fundamental'. This seems similar to the equivalence between type theory, logic and category theory.

So perhaps I can be stronger, and say that I still think "pointers are fundamental" is true, in the same way we can say "true" is fundamental (in logic), its just that a "terminal object" is also fundamental (in category theory).

What is perhaps more interesting is that we might be clear that pointers are fundamental in 'physical computation' (von-Neumann is not quite right as it is too specific to a particular architecture), it is not clear that 'MSE' is fundamental to lambda-calculus. If anything I would say it is functions that are the "equivalent" of pointers, as they "name" and "refer" to other (maybe unknown) expressions. If we use De Bruijn notation perhaps the similarity becomes clearer, as the De Bruijn number is effectively an 'index' or pointer into the argument list.

We could even argue that lambda calculus is already far too abstract to be considered fundamental in any way. It relies on variable naming and scoping rules that are not trivial to implement. De Bruijn numbers certainly seem less abstract than the complexities of variable naming. I think there is a reasonable case for considering a De Bruijn number a scope relative argument pointer.

de Bruijn vs combinatory logic

Everything in the SLRP document is much less complex than x86, and can be implemented with fixed memory. You can implement it if you want. I'm not going to accept an unreasonable burden just to prove a point to someone who has the gall to be wrong on the internet.

Simply declare the state outside of the scope of interest and it magically disappears.

The stream is the process state. You asked about storage. State isn't the same concept as storage. Though, I do understand that the two are easy to conflate in context of conventional computing models.

Storage is separated for good reasons. A stream can easily be *pipelined* through tens of physical processors for parallelism and performance, but only if processors don't know where the stream is sourced or going. Further, conventional notions of storage (e.g. to an external tape or memory) aren't even a necessary component of SLRP systems. We could just indefinitely circulate an effectful 'operating system' process through tens of processors, so long as we arrange for the total bounded buffers to be larger than our stream.

arbitrarily large lambda term, which would therefore not fit in any kind of 'register'

Yes. The SLRP machine code in the article is essentially a recipe for general computing with 'lambda terms' that are larger than 'registers'. Or, technically, a combinatory logic (one that easily converts to or from lambda calculus), where combinators may be much larger than a fixed processor memory.

it is also very inefficient

I disagree. Like a '\xyf.fxy' to encode the 'pair' constructor is only eight characters, and its runtime performance is acceptable. I grant the overhead is significant for 'small' data. I wouldn't want to use it for low-level numbers and binaries, for example. Thus, SLRP article provides an overview for how to handle those using dedicated operators. It'd also be trivial to dedicate a few operators for a small set of common constructors - e.g. pairs, sums.

The reason pure SLRP is inefficient has mostly to do with the lack of 'pass by reference' data plumbing. Not the encoding of data.

Regarding "efficient and practical" physical computing. Computation without references can be "scalable and practical" by leveraging physical parallelism. Arguably, it might even be more 'fundamental' from a physics perspective. But I grant that it will probably always be more efficient to talk about big things using small references rather than cloning or physically moving the big thing.

The SLRP article sketched a 'virtual memory' adaptation where we could replace a volume of the stream by a placeholder reference that can be loaded on demand. This is weaker and purer than mutable state and does not affect the expressiveness of SLRP (e.g. we could easily use content addressed storage with secure hashes). But it is indeed very convenient for efficient computation.

So, I won't argue with you on the 'efficiency' point.

I would say it is functions that are the "equivalent" of pointers, as they "name" and "refer" to other (maybe unknown) expressions

No. A lambda contains an expression like a shoebox may contain shoes. Container vs contained is not separated like reference vs referent. Containment isn't reference.

lambda calculus (...) relies on variable naming and scoping rules that are not trivial to implement.

That's why I favor combinatory logic. The power and expressiveness of lambda calculus (we can still use Church encodings, etc.), but none of the messy variables.

there is a reasonable case for considering a De Bruijn number a scope relative argument pointer

Your personal definition of 'pointer' seems to be much broader and vaguer and less meaningful or useful than how the word is used in programming. A variable usually isn't considered a reference/pointer unless it's shared and mutable, in which case we need the concept of 'references' (e.g. as opposed to physical rewriting or substitution) to explain why mutation is observable non-locally.

In any case, use of combinatory logic neatly avoids this hand-wavy argument of yours.

Combinatory Logic

A variable usually isn't considered a reference/pointer unless it's sharable and mutable

Knuth on p233 of TAoCP gives pointer as synonymous to link, reference and address. There are no requirements on shareability, mutability or aliasing. A counter example is trivial to find, the next element link in a singly-linked-list is obviously a pointer, and in a functional language such structures would be immutable, and unaliased.

I also don't see any evidence that SKI combinators are 'more fundamental' than pointers. A Random Access Machine is not implemented using combinatory logic, so it does not depend on it.

I think the conclusion is that both pointers and combinatory logic are fundamental, but in different models. All you have shown is that there is a different parallel model, but this parallel model is not more fundamental so I don't think its a counter argument to pointers being fundamental.

re pointers, and 'fundamental'

Perhaps it hasn't occured to you, but links, addresses, and references are indeed language constructs that we can share within a language.

Further, there is an essential distinction and separation between the address and addressed, link vs target, reference vs referent.

When I give you a list {1,2,3} in a purely functional language, there is no reference semantics, no separate 'referent'. It's just a value. (No matter how it's implemented under the hood.)

Mutability isn't a requirement for references in general. It only is necessary in specific contexts, such as variables, to force a 'this must be a reference instead of a value' semantic interpretation.

don't see any evidence that SKI combinators are 'more fundamental' than pointers.

Who argued combinators are 'more fundamental'?

Pointers can trivially be be 'fundamental' to von Neumann models or the C language or other models defined in terms of pointers.

However, because counter-examples like cellular automata exist, you cannot reasonably claim pointers are 'fundamental' to lower levels such as physical computing.

To clarify, I'm using the natural language meaning for 'fundamental' - "a necessary base or core, essential". One counter-example is all it ever takes to disprove a 'necessary' condition.

Then you added a 'general purpose' requirement.

Now, I am inclined to grant that cellular automata and other Turing tarpits are arguably too awkward for general human purposes. However, lambda calculus or combinatory logic is quite suitable. Thus, SLRP serves as a counter-example to your adjusted claim, "pointers are fundamental to general purpose physical computing".

I'm sure that if you move your goal posts precisely enough, you could achieve a claim that I'd broadly agree with. Something close to "pointers are fundamental for efficient, centralized, general purpose physical computation over large objects".

In any case, I've never argued that combinatory logic is somehow 'more fundamental' than pointers to physical computing. Rather, based on the same form of argument, the existence of von Neumann architecture would similarly be sufficient to disprove that combinatory logic is fundamental.

Note: I did posit that "computing without references" is arguably more fundamental to physics. However, I wasn't alluding to combinators. IMO, high level language and logic constructs like references or combinators are much higher abstractions than the fields and forces of physics. Cellular automata seem much closer to physical computing than anything humans would want to program.

Also: please do not conflate 'combinatory logic' with 'SKI combinators'. To me, that's about as cringeworthy as someone saying 'BASIC' to refer to 'imperative programming' might be to you.

the conclusion is that both pointers and combinatory logic are fundamental

No. If you mean "fundamental to general purpose physical computing", and if 'fundamental' means necessary or essential, then the correct conclusion is that *neither* is fundamental.

Neither?

Your argument seems to be that neither can be considered fundamental because the other exists. This seems wrong to me. It is like saying none of the legs of a table are fundamentally required to have a table, yet take all the legs away and you very definitely do not have a table.

Neither.

Legs aren't fundamental for tables. Tables could be attached to a wall, or suspend from a ceiling. Perhaps in a sci-fi future, they could just hover. What you might say is fundamental is "a mechanism to remain above ground level" or something like that. Legs would qualify as such a mechanism.

Similarly, for centralized processing, you might reasonably say we need "a mechanism to represent processor attention" (or focus). Because, by definition, centralized processing cannot operate on the whole computation at once. Pointers are one such mechanism. But so are mobile agents, or stream processing, or perhaps mechanisms I've never imagined.

Of course, you could say that legs are fundamental to legged table models. Or that pointers are fundamental to von Neumann models. But we should always be cautious about generalizing, especially when we don't even know all the possible models. We humans are much better at fallacy than we are at logic, and faulty generalization is a common one.

Fundamental

I don't think its right to say neither are fundamental. The problem is they are disjoint. For example we might say both rails and roads are fundamental to public transportation. The difficulty we have is that there is no direct equivalent to pointers in combinatory logic. We cannot (quite) say "both pointers and combinators are fundamental to computation" but perhaps that's the best we can do?

Faulty generalization

The form of your argument would lead to such absurdity as 'finger painting is fundamental to art' as a whole just because it's the essential core of one form of art, or that 'thatch is fundamental to roofing' just because it's one kind of roofing.

The problem with your "rails and roads are fundamental to public transportation" analogy is that it's poorly qualified. You have many implicits in there, like 'today' or the entire economic context with which you're defining 'fundamental' (why not also claim personal helicopters are fundamental?). As written, without any qualifiers, it's simply wrong. For example, we only need to look back a few hundred years to find a time where rail wasn't even an option.

Pointers are 'fundamental' to computation in some cases. Try to figure out the qualifiers.

MSE

MSE requires the fixed point combinator, how do you define the fixed point combinator in combinatory logic?

re fixpoint combinator for MSE

The article has a paragraph on the subject:

A strict fixpoint combinator: `z[F][X] == F[z[F]][X]`. We could implement this by hand: `z = ic~[iwba[c]b[ic~a[~]]]~`. However, a dedicated loop combinator can reduce overheads and simplify recognition of loops by a processor that might want to perform internal optimizations. If the function operand is too large, we can rewrite `z[F == ia[o[z]q]c~[F`.

Here ~ is a combinator defined in the article's prior paragraph to explicitly await a second operand ('~[A][B == [A][B').

Aside: There are simpler hand implementations for z. That one 'ic~[iwba[c]b[ic~a[~]]]~' was designed for some nice properties, such as you can easily see the quine, and functions aren't replicated before the instant of use. But a built-in version is both more efficient and more convenient.

Copying and Abstraction

Okay that makes sense, I presume the argument to 'Z' is the actual definition of the function to take the fixed point of, because it can't be a reference, which means the whole definition, no matter how big must be copied. Was this the inefficiency you were referring to? Having to write all the code out in full every time with no ability no name routines is not very useful from a programmers point of view, but I assume templates could be used to generate this code.

Having references in the language is definitely a step up in abstraction from this. Perhaps we could agree that pointers in a machine code (numbers referring to addresses) serve the equivalent purpose to naming in a functional language (machine code function pointer is equivalent to naming a function expression)?

In which case I admit it is possible to define a machine code (bitstream) without pointers, just like it's possible to have combinatory logic which does not have naming. However writing everything out in full without being able to refer to expressions or memory regions seems very limiting of the size of program that can be written.

So whilst I don't think I want to program in a language without naming or pointers, I do now see that this could be the bottom layer of the abstraction hierarchy. My take on the abstraction hierarchy would now be:

- combinatory logic (or equivalent bitstream)
- naming (pointers, Stepanov's animals)
- types (types of names, Stepanov's species)
- interfaces (genera: groups of properties shared by species)

What would come next?

re copying and abstraction

I feel that hierarchy is too awkward for general use.

For example, types don't need to be tied to names. We can compute or check anonymous structural types against anonymous combinators. Naming, or semantic compression, is indeed a primitive form of abstraction. So is typing, where we describe a set of values. But is naming below types or orthogonal to them? I'm inclined to argue naming is an independent axis. Are interfaces truly distinct from types? I'm skeptical.

Also, in the general case, pointers are about more than abstraction. They're also about expressiveness (per Felleisen's formal definition).

Consider: the simplest form of 'naming' suitable for combinatory logic is a content-addressed storage (or other fixed-content storage). The processor doesn't choose names or assign locations. Instead, a separate storage unit accepts content and returns a token (possibly a secure hash) to retrieve the content later on demand. Containment semantics are preserved. The resulting language is exactly as *expressive* as it was before we added naming - no jumps, no mutation, no direct control of data representation. But we do get structure sharing, lightweight data plumbing, and semantic compression.

I feel we should be careful to distinguish expressiveness and abstraction. Thus, mention of combinatory logic or pointers specifically doesn't seem to 'fit' a discussion of abstraction.

Naming etc

What is at the bottom then? What is the least abstract form of computation? Combinatory logic seems pretty low on abstraction, you have to spell out all the details every time because there is no naming or other way to hide the details.

We seem to agree that naming is a primitive form of abstraction, so it must come fairly low down. You must have names to have "objects" that is states, otherwise you just have values. I think that content addressable storage is a valid form of naming, it has the same properties of abstraction, but is not as expressive as a pointer to a mutable location.

Are types orthogonal to names? I agree they are, but they are also more abstract than names, as names only let me refer to the same object, types identify a species of objects, like "Cats". So although we don't need names to have types, types are more abstract than names.

Interfaces are distinct from types because any object can only have one type, but it's type can implement many interfaces. Interfaces would be more abstract than types because many types can have the same interface. Note that interfaces, the way I am defining them are an abstraction of types not the objects. In other words "having four legs" is an interface of "Cats", whether or not any given cat is anonymous or has a name.

So there is clearly progression from less abstract to more abstract here, but perhaps it's not clearly separated into nice layers.

Speculating about what might be more abstract than interfaces, I think something like a "Protocol" which would be something like a GADT. Something that captures the abstract behaviour of a system of abstract datatypes.

Physics

What is at the bottom then? What is the least abstract form of computation?

I don't think we can define this for 'computation' which is a mathematical concept (so the foundation is any set of axioms we choose; math is independent of physics).

But for 'physical computation' the least abstract form is simply a model of physics. You'd "compile" a program into a physical system. Or conversely, you'd interpret any physical system as a program. As a philosophy, this leads to computational interpretations of the universe (search keywords: computational universe or digital physics).

Not exactly practical, at least without Star Trek style replicators. But practical isn't what you just now asked for.

You must have names to have "objects" that is states, otherwise you just have values.

Gotta disagree with this. State can be anonymous. It's anonymous in physics, cellular automata, term rewriting, and temporal logic.

SLRP is a rewriting model. The specific machine code I chose is confluent up to effects (all orders of rewrites have the same result). That's a very convenient property! However, it wouldn't be a problem for the SLRP architecture to have a non-deterministic machine code, or to have a few effectful rewrite rules. In those cases, it becomes obvious that the stream itself is anonymous state, not just a value.

We seem to agree that naming is a primitive form of abstraction

Indeed. Naming, or reference, is certainly one of the primitive forms of *linguistic* abstraction. Whereas we might also study structural or behavioral abstractions (which we might call 'types') even in context of anonymous combinators.

You've often posited that we cannot have state without names. Although this is incorrect, there is a weaker claim you could reasonably argue: we cannot directly talk about state without names. No imperative assignments, for example, nor any specific queries. (Of course, anything can still be indirectly modeled in a Turing complete system.)

(types) are also more abstract than names, as names only let me refer to the same object

Types can refer to more than one thing, whereas name overloading is not a thing. Gotcha. :p

I agree that types tend to be less specific than common use of names. Not sure if that implies more abstract, though. And it doesn't generalize to references, which could include such things as "all the boys in the yard".

What I'd argue instead is that types and names are abstracting different aspects of a system: types abstract behavior, references abstract language.

Interfaces are distinct from types because any object can only have one type, but it's type can implement many interfaces.

Why can objects only have one type? Which is the one and only type of number 3: integer, natural number, rational numbers, real numbers, odd numbers, prime numbers, numbers between 1 and 10, or something else?

Or is the one type per object an artifact of language rather than an essential feature of objects and types?

Speculating about what might be more abstract than interfaces, I think something like a "Protocol" which would be something like a GADT.

Session types and substructural types are useful for describing behavior, as a suggestion for further contemplation.

-

Duplicate

-

Duplicate

-

Duplicate

VDHL is not a counterexample

Consider VHDL, SAFL, circuit diagrams, various box-and-wire languages. Some of these may involve static memory, and the "compiler" might produce instructions for printing a circuit board. Whether they interface with or utilize addressable memory units internally could be optional, rather than fundamental.

I've just checked VHDL documentation. It is the tier 3 or higher language. And surprise, It also have pointer types in the form of access types (quote from IEEE Standard VHDL Language Reference Manual, 2000):

type ADDRESS is access MEMORY;
type BUFFER_PTR is access TEMP_BUFFER;

In second line they even use "_PTR" suffix. Or more extended example from that document:

type CELL; -- An incomplete type declaration.
type LINK is access CELL;
type CELL is
  record
     VALUE : INTEGER;
     SUCC : LINK;
     PRED : LINK;
  end record CELL;
variable HEAD : LINK := new CELL'(0, null, null);
variable NEXT: LINK := HEAD.SUCC;

Addressable units are different, but they still present.

I'm too lazy to check other provided languages. If they are well developed the tier 3 languages, they will likely introduce the pointers in the some form as well.

If you wil find a well developed tier 3 language without pointers or similar constructs, we coud continue the discussion. As there there are a lot of languages, they could be not examined completely by a single person. So please come with a valid counterexample, rather than throwing the language names into the discussion arbitrarily.

I'm more interested in mature examples of the languages that had some time time to evolve basing on user feedback on the tier 3, than in the early versions that might have not yet gathered actual requirements of the domain.

VHDL and other hardware

VHDL and other hardware description languages do support memory units (because why wouldn't they?) but it is a lot less fundamental.

I'm too lazy to check other provided languages. If they are well developed the tier 3 languages, they will likely introduce the pointers in the some form as well.

If you wil find a well developed tier 3 language without pointers or similar constructs, we coud continue the discussion.

No, you clearly just said you'd be too lazy to even look.

We can model references in any language of sufficient power, so to clarify: I'm not saying these languages cannot use refs/pointers, I'm saying their use is optional in these languages and we can still use them at tier 3 without using pointers.

Where are counterexamples?

Hey! Lazy are you throwing in language names without checking them first. Why should I follow that game?

I provided counterexample to your conterexample. You have listed a list of languages that supposedly do not use pointers. I looked at the first referenced language, and VHDL has pointers. Your statement is invalidated. Period.

It is now your turn to provide a well developed tier 3 langauge that does not have pointers or similar costructs that reference group of objects by name relative to some context.

It is possible design such language, but will people use it if not forced?

Pointers are optinal in C too. It is possible to write:

int main() {
   return 1;
}

There is no pointers in this porgram. And 'false' UNIX program is impelemnted just like that.

It is also possible not to use lambdas with programming on Common Lisp, build it macros provide enough flexibility to code in a procedural style.

The point is that VHDL has introduced pointers during its evolution. And they were introduced as a complexity management tool. It is possible to not to use them, but it is possible not to use other features as well. So what?

I provided counterexample to

I provided counterexample to your conterexample.

I did know that VHDL can work with memory and pointers before you mentioned it. I just don't consider that relevant. What is relevant is that VHDL can represent sophisticated computations without pointers, then compile to hardware that also doesn't use pointers.

You say pointers are 'optional' in C. But to my knowledge, even the simplest compiled C uses pointers.

The point is that VHDL has introduced pointers during its evolution. (...)

The question was never "are pointers useful?" It's "are pointers *necessary* for tier 3 notations (either within the language or their implementations)?" Thus, correctly pointing out that they're useful and popular is simply irrelevant. Popularity and utility simply aren't necessary features to qualify as a tier 3 notation.

The key abstraction of the

The key abstraction of the tier 2 is map from names/numbers to elemental entity in the context of notation. The things are connected on the same structural level. This map with horizonal structures is named pattern in the articles. Elemental entity is with relative to notation, for example for KPN a node in the graph is an elemental entity because it does not decompose further.

The key abstraction of the tier 3 is refrencing another pattern of objects by name/position instead of elemental entity. This apply to both data and behaviour.

Procedures/functions/blocks are reflections of this cogntive operations in the behaviour aspect. Records/pointers are reflection of this cognitive operation in the data aspect.

VHDL also has two ways to do it: pointers and proceduers/blocks. Block is embedded behaviour reference, procedure is reference by name, the behavior elements are named by ther index position within block or labels.

projections vs pointers in hierarchies

I find the Unison Web project interesting in this context. The 'names' used to refer to other functions at program layer are secure hashes, ensuring 'containment' semantics. But Unison Web leverages projectional editing to provide a human-friendly veneer, rendering those secure hashes according to user-specific database of names. Arguably, we could do similarly with structured subprograms, perhaps rendering `(λx.x)` as `I` and `(λf.(λx.f (x x))(λx.(f (x x))))` as `Y` and `(λx.λy.λz.(x z (y z)))` as `S`.

Due to such possibilities, I feel that "names" should not necessarily be considered "pointers" or "references". Because they might instead be "projections".

Even ignoring projectional editing, I think it's reasonable to consider names to represent a compressed 'containment' relationship, so long as names aren't used in ways that can only be explained via references/pointers (such as naming shared state, certain recursion schema, goto label variables, etc.).

The basic idea of hierarchy is replacing a part of the pattern with its representative.

Patterns describe and organize objects. Hierarchies describe and organize patterns. Building bottom-up seems to fit the idea of patterns as 'projections' better than it does patterns as 'blueprints'. Whether we replace the representative by the pattern or the pattern by the representative seems independent of your tiers of abstraction.

The important thing is that on the level of the parent object, the part of the content is packaged into separate entity and that entity is referenced by representative.

So you agree it depends on how the 'objects' are represented. But you seem to be considering a specific way to represent them (as "separate" entities). Your own article uses "by containment or by reference" when describing hierarchies. Why not attend to that intriguing "by containment" option?

IMHO it is even not possible to create acyclic graph w/o pointers. Only tree is possible.

Depends on what you mean by "create a graph".

Via adjacency matrix, I can represent cyclic graphs. Via fixpoint computations, I can model general recursive computation. If I have both fixpoints and limited support for call-by-need computation, I can support infinite, loopy co-data structures. We can construct Kahn Process Networks in terms of second-class process and port identifiers, and these KPNs would support cyclic streaming computations.

You could serialize these graph-like objects (including some code) using a simple string without internal anchors or offsets.

Thoughts: Shared state doesn't require references. There are lots of ways to share state that don't involve aliasing it. Such as scheduling a bunch of `State -> State` threads over a shared `State` value. Immutable objects don't require references. We can deep-copy our values instead of optimizing memory representation via references.

It's only a specific approach to shared state through references - aka aliased state - that strongly implies use of references. And that's obviously because references/pointers are assumed in the definition of 'aliasing'. Whether aliasing is somehow fundamental to physical computation is a separate question.

if we have to support unknown depth structures, we could not use field offset, we have to use pointers internally to store locations of embedded objects

Not really. You could use a string of variable size to represent an arbitrary tree, no need for offsets or pointers. But it wouldn't be efficient to manipulate a tree via this representation. Hence, the "A 'Tree' would usually be implemented via pointers because that's efficient" assertion I made earlier.

Fixpoint operation creates a structure of the unknown memory size and unknown internal structure. I also have a trouble of imagining how it could be implemented without pointers.

It isn't difficult. Start with SK combinators. Develop a small-step string-to-string evaluator. The S combinator has behavior `Sxyz == xz(yz)`. Whenever this is applied, literally copy the `z` substring twice into the result. Next, implement fixpoint and some toy recursive functions via SK combinators, then run it via your evaluator. It won't be efficient. But it's not difficult.

all attempts (known to me) at the languages with purely immutable structures were at the tier 4

I suggest you look into cellular automata and term rewrite systems. :D

This hints that black boxes are the minimum tier for hiding shared state.

Do you know about opaque data types vs abstract data types? Opaque data types are essentially the second-class form of abstract data types. Opaque data types are sufficient to hide shared state, and the API over the opaque type represents a contract. But opaque types don't give us any ability to abstract over multiple implementations of this contract, so I feel they fall a bit short of tier 4.

languages w/o shared state are currently an exception

Sure. Better to test a general classification against all the edge cases, tho. Including pure languages, tacit languages, concatenative languages, combinatorial logic, term or graph rewrite systems, Kahn process networks, cellular automata, multi-agent systems, ant-based computing, Unison Web where secure hashes serve as compressed containment of large subprograms, etc..

I think that shared state is a fundamental

Shared state is essential to some problem domains. We're all living on the same ball of mutable state, after all. Applications developed in a programming language will ultimately need to interact with a shared network and shared databases. But a point worth considering is that all essential shared state is, by nature, outside of the software application. So I wouldn't casually accept a claim that allocation of shared state should be fundamental to programming languages.

Idea of the pointer that we resolve a name to the group values

And the idea of image recognition, pattern recognition in general, and a related subset of projections, is that we resolve a group of values or observable pattern into a human-friendly name. :D

Mental model vs code model

Firstly, let’s make a distinction: the model in the mind and the model in the code. These models sometimes match, sometimes are different. The abstraction tiers of these models can be different. The greater is the difference, the more transformations are needed to work with code. So, this increases cognitive load.

It is possible to write object-oriented program in C. C GUI toolkits are evidence of it. They are ugly, inconsistent, but they work. However, it is much easier to work with C++ or Java wrappers over these C GUI toolkits, because distance between the model in the mind and the model in the code is less. Therefore, less transformations in mind are required to achieve the goal. This cognitive overhead reduction is one of main reasons why the new generations of programming language are adopted.

… Arguably, we could do similarly with structured subprograms, perhaps rendering `(λx.x)` as `I` and `(λf.(λx.f (x x))(λx.(f (x x))))` as `Y` and `(λx.λy.λz.(x z (y z)))` as `S`.
Due to such possibilities, I feel that "names" should not necessarily be considered "pointers" or "references". Because they might instead be "projections".

Patterns describe and organize objects. Hierarchies describe and organize patterns. Building bottom-up seems to fit the idea of patterns as 'projections' better than it does patterns as 'blueprints'. Whether we replace the representative by the pattern or the pattern by the representative seems independent of your tiers of abstraction.

You are starting at very high-level abstraction. The lambda calculus is the tier 4. While intention of the ‘pattern’ concept in my articles is of lower level. The pattern/template is flat set of labeled objects. The behavior template/pattern is set of labeled actions with transitions between them – recipe that have numbered/named steps and transitions between steps by name/order. The structural pattern is flat mapping to values. Like global variables in BASIC or FORTRAN 66, or global labels in Assembler.

The logic of pattern level is transduction, or by analogy. So, we match situation with pattern, take objects that matched holes in the pattern, and insert those objects into another pattern. The transition between the tier 2 and the tier 3 happens when we able to refer not only to simple values, but structures as well by name. So, code and data structure stop to be flat globally, but each node in this structure is still flat mapping from name to value or structure.

So you agree it depends on how the 'objects' are represented. But you seem to be considering a specific way to represent them (as "separate" entities). Your own article uses "by containment or by reference" when describing hierarchies. Why not attend to that intriguing "by containment" option?

I guess you are well aware that C++ and many other languages have option to reference object allocated separately or declare it as part of the object: “my_subobject test;” or “my_subobject* test;”.

Depends on what you mean by "create a graph".

Via adjacency matrix, I can represent cyclic graphs. Via fixpoint computations, I can model general recursive computation. If I have both fixpoints and limited support for call-by-need computation, I can support infinite, loopy co-data structures. We can construct Kahn Process Networks in terms of second-class process and port identifiers, and these KPNs would support cyclic streaming computations.

You could serialize these graph-like objects (including some code) using a simple string without internal anchors or offsets.

The ways you are specifying require additional steps between model in the mind and model in the code. The references in mutable objects are much more direct representation of the graph. Particularly if nodes are of different type and relationships have different semantics. Those matrix representations add cognitive load on the developer, as the developer has to maintain additional mappings between problem domain and the code. Java and C# just allow to write code with non-uniform objects and their relationships. When working with object graph, most of the logic is local to the node, and then it propagates to referenced nodes. Whole uniform graph processing (where marix representation shines) are very rare cases.

Thoughts: Shared state doesn't require references. There are lots of ways to share state that don't involve aliasing it. Such as scheduling a bunch of `State -> State` threads over a shared `State` value. Immutable objects don't require references. We can deep-copy our values instead of optimizing memory representation via references.

While shared state does not require references, as we could work in assembler with integer addresses, which provide us mapping from integer keys to values. However, references provide two things: type of content and named address of the content. And we could address shared content knowing what to expect. Those global maps of objects by ID is just a lower level of the abstraction.

Not really. You could use a string of variable size to represent an arbitrary tree, no need for offsets or pointers. But it wouldn't be efficient to manipulate a tree via this representation. Hence, the "A 'Tree' would usually be implemented via pointers because that's efficient" assertion I made earlier.

The point you are missing that when we are working with data structures in a programming language, the programming language provides us multiple services, like error checking, code exploration support, etc. We lose all these services with using string as primary data structure. I’ve written programs with TCL/Tk and it was real pain as strings were the only data type supported.

Fixpoint operation creates a structure of the unknown memory size and unknown internal structure. I also have a trouble of imagining how it could be implemented without pointers.

It isn't difficult. Start with SK combinators. Develop a small-step string-to-string evaluator. The S combinator has behavior `Sxyz == xz(yz)`. Whenever this is applied, literally copy the `z` substring twice into the result. Next, implement fixpoint and some toy recursive functions via SK combinators, then run it via your evaluator. It won't be efficient. But it's not difficult.

Ok. In one tier 4 language we could implement interpreter for another tier 4 language. What is the point? Could you build a stack up to machine code without pointer? Or could you build cost-effective hardware that implement lambda expressions natively? When building an implementation stack, it makes sense to start from complex and move to simpler and directly executable abstractions like machine code. What is the point of moving from complex to complex or from complex to even more complex?

all attempts (known to me) at the languages with purely immutable structures were at the tier 4

I suggest you look into cellular automata and term rewrite systems.

“cellular automata” – global shared state in the form of board. “term rewrite systems” – model of work is changing of shared term. What relationship they have to immutable state? Or would you argue that SQL “update” statement is also about immutable systems?

Do you know about opaque data types vs abstract data types? Opaque data types are essentially the second-class form of abstract data types. Opaque data types are sufficient to hide shared state, and the API over the opaque type represents a contract. But opaque types don't give us any ability to abstract over multiple implementations of this contract, so I feel they fall a bit short of tier 4.

Opaque data types and abstract data types are attempts to implement the tier 4 concepts in the tier 3 language. This is an example of difference between mental and code model. The correlation between methods has to be maintained manually. Java and other OO languages allow to achieve such goal more cleanly and with much lesser gap between mental model and code model.

Sure. Better to test a general classification against all the edge cases, tho. Including pure languages, tacit languages, concatenative languages, combinatorial logic, term or graph rewrite systems, Kahn process networks, cellular automata, multi-agent systems, ant-based computing, Unison Web where secure hashes serve as compressed containment of large subprograms, etc..

Any article has own scope. And also, them model is applicable to abstractions that are used by humans and thus to programming languages used by humans. If you wish, you could do a classification yourself. For example, Kahn process networks – are the tier 2 notation (basing on Wikipedia article): there is globally named nodes and links between them, there is no sub-nodes in the flow diagram, and there is no free recursion between processes, process links are static. Ant-based computing is an optimization technique rather than notation. You could check others as yourself and try to fit them into the proposed model.

Shared state is essential to some problem domains. We're all living on the same ball of mutable state, after all. Applications developed in a programming language will ultimately need to interact with a shared network and shared databases. But a point worth considering is that all essential shared state is, by nature, outside of the software application. So I wouldn't casually accept a claim that allocation of shared state should be fundamental to programming languages.

Yea. This philosophical point could be argued for. Computations that do not have a useful effect are useless. This is why proponents of functional programming always call these useful intended effects “side effects”. Also, the age of batch jobs is gone. Now many applications are interactive, they could not take input as argument and offload output at the end of the job. And their internal state of the applications suddenly becomes important, and the languages that support convenient work with such shared state in the straightforward way are more useful for such application kinds as they require less of the model translations.

the model in the mind and

the model in the mind and the model in the code. These models sometimes match, sometimes are different. The abstraction tiers of these models can be different. The greater is the difference, the more transformations are needed to work with code. So, this increases cognitive load.

In context of projectional editing, you should add an automated lens between mind and code. This lens can simultaneously reduce the cognitive load on the mind and reduce user-interface overheads in the code, resulting in a model that's simpler for all observers of the codebase (humans, bots, compilers, etc.).

You are starting at very high-level abstraction. The lambda calculus is the tier 4.

Whether lambda calculus supports high tier abstraction is irrelevant to the point I was making about projections and names.

you are well aware that C++ and many other languages have option to reference object allocated separately or declare it as part of the object: “my_subobject test;” or “my_subobject* test;”

Of course. But C++ has difficulty representing 'containment' of recursive data structures. Whereas, in purely functional languages, a type like `type Tree a = Node a (List of Tree a)` only ever represents a containment relationship. It could be implemented with pointers for performance, or as a giant string just to make a point that pointers aren't required.

The references in mutable objects are much more direct representation of the graph. [...etc...]

Working with graphs in a language without mutation is not particularly difficult. Matrices are awkward for some problems, but I did mention three other approaches. Further, if we admit references but exclude mutation, we can add adjacency lists, relational tables, and other ways of representing graphs that utilize labeled vertices.

In any case, I grant that mutation does allow easy construction of cycles. Whether or not this is a "good thing" is a separate question. I find I rarely need cyclic objects *unless* I'm working with mutable objects. And when I do, the cycles are a common source of subtle bugs (reentrancy, etc.).

While shared state does not require references, as we could work in assembler with integer addresses, which provide us mapping from integer keys to values. [...]

I'd still consider those to be pointers or references. A pointer or reference is a first-class value that we use to index an external resource. We can speak of nice properties for a reference - self describing, strong typing, self-validating, multi-homing, cryptographically unforgeable, etc.. But a reference in the absence of nice properties would still be a reference with respect to computation.

The point you are missing that when we are working with data structures in a programming language, the programming language provides us multiple services, like error checking, code exploration support, etc. We lose all these services with using string as primary data structure.

No, you're clearly missing my point. I'm saying the language's *compiler* could implement the tree as a string. It wouldn't be efficient. Indeed, it'd be utterly pointless EXCEPT for this one specific point: "yes, we can implement trees without using pointers, suck on that const". We wouldn't lose ANY of those nice services you just mentioned. Construction and manipulation of the tree would still be strongly type-safe, for example. Indeed, other than the crawling slowness and extra warmth from the CPU, the programmer wouldn't even notice that the tree is implemented as a string.

Ok. In one tier 4 language we could implement interpreter for another tier 4 language. What is the point?

You said, "I also have a trouble of imagining how it could be implemented without pointers". I was telling you how it could be implemented without pointers. That is the point. Your lack of imagination. . . disturbs me.

Could you build a stack up to machine code without pointer? Or could you build cost-effective hardware that implement lambda expressions natively?

I wouldn't start with the lambda calculus. But we can develop a low-level set of purely functional concatenative combinators that operate on a stack similar to a Forth machine - a purely functional abstract machine code, of sorts. We can build upon this monadic programming or algebraic effects to support effectful programming, if we wish.

For performance, I know of two routes. Either we have an expanding set of 'primitive' data types and combinators that can leverage hardware efficiently (like a matrix multiply operator to leverage SIMD), or we can develop software accelerators - basically, compiler-intrinsic functions with a reference implementation - so we can add 'performance primitives' to the language without adding new 'semantic primitive' combinators. Sort of a CISC vs RISC philosophy.

There are several languages in this vein: Joy, Cat, Kitten, my Awelon. It wouldn't be difficult to adapt J or K languages to a purely functional foundation, either.

When building an implementation stack, it makes sense to start from complex and move to simpler and directly executable abstractions like machine code. What is the point of moving from complex to complex or from complex to even more complex?

You seem to be confusing "high tier of abstraction" with "complex". Lambda calculus isn't particularly complex. Combinatory logic is even less so. And concatenative combinators would give us nice, streamable code as our foundation - much simpler than machine code with its awkward backwards jumps.

Compiling goes from simple abstract machine code to complex physical machine code, not complex to simple.

“cellular automata” – global shared state in the form of board.

Incorrect. Cells cannot mutate even their own state, much less that of their neighbors or the rest of the board. The board is immutable. Instead, we compute the 'next' state of each cell, thus computing a new board. This is best understood with immutability, and perhaps a simple temporal logic. It's also below tier 2.

“term rewrite systems” – model of work is changing of shared term. What relationship they have to immutable state? Or would you argue that SQL “update” statement is also about immutable system?

The term itself is immutable. We produce a new term after each rewrite. This is no different to how adding or modifying an entry to an immutable tree produces a new tree, or how incrementing 7 returns 8 rather than mutating all 7s everywhere.

SQL 'update' does modify a variable that contains a table. But the table itself is supposedly an immutable data structure (per relational algebra).

Kahn process networks – are the tier 2 notation (basing on Wikipedia article): there is globally named nodes and links between them, there is no sub-nodes in the flow diagram, and there is no free recursion between processes, process links are static.

Wrong on most points. KPNs abstract the 'process' so it can be another KPN. This gives us tier 3 hierarchies. Channels may be locally named within each process network, e.g. we attach the channel to an open port on a process. Cyclic relationships are permitted - we can even loopback a process's outputs directly. But it's true that links are static.

The article you found was just describing the essential properties of KPNs. The example diagram you found is not really a usable notation.

Ant-based computing is an optimization technique rather than notation.

Ant-based computing is not just an optimization technique. It has been used in that way, but it is also explored as a universal computation model by itself. However, I grant it's more like "OOP" or "publish-subscribe" - a paradigm rather than a notation.

Computations that do not have a useful effect are useless. This is why proponents of functional programming always call these useful intended effects “side effects”.

As a proponent of functional programming: We call them effects. We think effects are both important and troublesome, so we handle them with care: effects are separated from logic where feasible, and typed where used. We do shun "side effects" - effects that are hidden from the client of an API. But that doesn't mean we ignore "intended effects".

This proponent who always calls intended effects 'side effects' is misinformed about FP, is lying to you, or is a straw man you fabricated with intention to burn.

internal state of the applications suddenly becomes important, and the languages that support convenient work with such shared state in the straightforward way are more useful

Gotta disagree with you in the first half. Internal state is 'poison' to an application - it greatly hinders extension, administration, debugging, durability, and interaction between apps. Better to shove any application-specific state out to a database or filesystem.

I do agree that the language needs to support convenient work with shared state. But not as a primitive. We can model it in a library, as an 'effect'.

What is mutable and what is immutable?

“cellular automata” – global shared state in the form of board.

Incorrect. Cells cannot mutate even their own state, much less that of their neighbors or the rest of the board. The board is immutable. Instead, we compute the 'next' state of each cell, thus computing a new board. This is best understood with immutability, and perhaps a simple temporal logic. It's also below tier 2.

“term rewrite systems” – model of work is changing of shared term. What relationship they have to immutable state? Or would you argue that SQL “update” statement is also about immutable system?

The term itself is immutable. We produce a new term after each rewrite. This is no different to how adding or modifying an entry to an immutable tree produces a new tree, or how incrementing 7 returns 8 rather than mutating all 7s everywhere.

SQL 'update' does modify a variable that contains a table. But the table itself is supposedly an immutable data structure (per relational algebra).

The board and cells for cellular automata and term for term rewrite systems have the same logical identity among different versions modified by rules. Cells for cellular automata also have an identity among different versions.

The intended effect is change this object from one version to another. Using readonly snapshots could be implementation technique for such systems, but intention is changing that logical object. And they are often implemented as mutable objects.

Your argument could be extended to say that Java (or Turing Machine) works with immutable objects as we could consider its semantics as driving memory state from one memory snapshot to another according to rules of the programming language. This could be useful point of view for proofs, mental model of Java developer is usually different. Advantage of Java is that we usually do not need to consider the global memory state, but we could consider local changes.

IMHO you are stretching immutability too far, even to objects that logically are not. Any mutable object could be considered as sequence of immutable snapshots, and this often done in proofs. So we should say that everything is immutable? If association between logical identity and state changes, this logical object is mutable IMHO. So board in cellular automata is mutable logical object, as it has logical identity and changing state. Cell is mutable object as well. Turing Machine is a mutable object as well.

Re what is mutability

What 'mutability' means is a good question. I'll try to answer it to my understanding.

In lambda calculus, objects having 'identity' is not uncommon. E.g. when we fold over a list, we have an 'accumulator' variable that effectively has an identity - its role as accumulator - for the duration of fold. We can think about our manipulations to this variable as mutations. If the variable is of simple value type, like an integer, there's nothing more to think about. But if it's a structured tree, then we need to understand that modifications to the accumulator won't influence prior versions of the tree. This is the basic idea of immutability: that updates are local, that we aren't affecting observers of the prior version.

So mutability is the alternative condition: the updates can be observed externally.

Your argument could be extended to say that Java (or Turing Machine) works with immutable objects

Agree for Turing machines, where we only locally manipulate the head of a tape and the tape is supposedly hidden from external observers until computation halts. But not for Java, where a modified field can be observed by other threads.

Any mutable object could be considered as sequence of immutable snapshots

Indeed. The question is whether it's useful to think about it that way in context of developing the program itself. For cellular automata, it's quite useful to know that the order in which we 'update' cells is irrelevant, because we aren't mutating the cell's current version. For term rewriting, it's useful to know that two copies of a term can be moved into different contexts and rewrites on them are subsequently independent.

Or conversely, for Lisp it's important to know that a spurious 'setcdr' by a callback could ruin your tree. For Java, it's important to understand that modified variables affect other threads.

Aside: ur quotes all fubar, plz fix

Projectional editing

In context of projectional editing, you should add an automated lens between mind and code. This lens can simultaneously reduce the cognitive load on the mind and reduce user-interface overheads in the code, resulting in a model that's simpler for all observers of the codebase (humans, bots, compilers, etc.).

I do not think that projectional editing is relevant to disucssion here, as it is not related to issue of abstractions tiers used.

Also none of attempts at projectional editing really took off. There is a number of technical reasons for it: bad support from version control, bad support for code snippets and copy/paste, bad support for working with temporary incorrect code, and so on. I think that there is a logical problem under all these technical problems, but this is a topic for a separate discussion.

Re projectional editing

I do not think that projectional editing is relevant to disucssion here, as it is not related to issue of abstractions tiers used.

Several of your statements are based on an implicit assumption that the abstractions tiers are supported in a specific way - by modifying the stored code representation, with names as references between code. Projectional editing is relevant insofar as it contradicts this assumption.

Here's something 100% irrelevant, though:

Also none of attempts at projectional editing really took off.

Notation is what user is working with

Several of your statements are based on an implicit assumption that the abstractions tiers are supported in a specific way - by modifying the stored code representation, with names as references between code. Projectional editing is relevant insofar as it contradicts this assumption.

In the notation user works with, the user still see names that are resolved to object. Projectional editing tools just pre-resolve these names during editing process, and name resolution is sometimes ambigous depending on context. So, I do not see any contradition here.

Here's something 100% irrelevant, though:

Also none of attempts at projectional editing really took off.

If language technology repeatably fails to take off, this is an indication of some contradiction between how technology works and human brain works. So there is a problem with it as with a notation intended for human consumption.

The same argument is applicable to pure functional languages, that fail to occupy a significant market share despite all efforts from academia. Some elements of functional paradigm were integrated in OO languages mostly as shortcuts for OO constructs (like lamdas were adopted as shortcut for single method interface implementation). This is an indication of usability problem due to cogntive aspects of the notation. OO languages seems follow the way brain works better. In the personal experience, learning Java is easier than learning Haskell.

In the notation user works

In the notation user works with, the user still see names that are resolved to object. Projectional editing tools just pre-resolve these names during editing process, and name resolution is sometimes ambigous depending on context. So, I do not see any contradition here.

The contradiction is with something you said earlier: "The basic idea of hierarchy is replacing a part of the pattern with its representative. The representative is a pointer in general case."

Basically, you were taking the stance that pointers are needed for tier 3 because representing hierarchal structure requires pointers.

This *despite* explicitly admitting 'single object patterns' and 'containment' in your article.

The phrase was loosy

That phase was loosy and used word pointer in wrong way. It should have said "The representative is something that could be resolved to that referenced pattern, for example a pointer." or something like, the article has gone through more refinement cycles than comment, so it is usually more precise.

I still do not see relationationship to projectional editing.

Kahn process networks

I've not worked with this model, so I'm possibly missing something. But let's consider articles referenced from wikipedia.

Basic article by Khan (Kahn, G. (1974). Rosenfeld, Jack L., ed. The semantics of a simple language for parallel programming), it notes possiblity of recusion (sections 4), but it does not define it fully. From the text, it obvious that author has only vagues ideas on how it could be done.

This thesis (Parks, Thomas M. (1995). Bounded Scheduling of Process Networks (Ph. D.). University of California, Berkeley.) does not mention recusion at all.

This article (Geilen, Marc; Basten, Twan (2003). Degano, P., ed. Requirements on the Execution of Kahn Process Networks.) also does not mention recusion at all.

While Khan had some vague ideas about recusion, these ideas have not been implemented. Particually, notation does not have notation to spawn a new process with parameters, which is critical for recusive processes. All processes in the actual notation are pre-created and work as parts of pipeline. I would argue that the tier 3 notation for process would have been different (for example, like this).

The minimal test for the notation night have been specify Ackermann function in recurive form, that we send a pair of numbers and receive a result.

So, after more detailed consideration, I'll still classify it as the tier 2 notation in the current form, potential developments could change it, but these developments did not happen at the current time.

KPN recursion

We can have hierarchical KPNs, and cyclic KPNs. But it seems you mean something else by 'recursion', that one KPN can evaluate another.

In my language, KPN descriptions are represented as values. And 'running' a KPN is a pure (confined, deterministic) computation, albeit benfiting from specialized runtime support. So, KPNs can be passed on channels and evaluated recursively. But I grant I haven't seen this feature elsewhere.

But regardless of recursion, AFAICT, *hierarchical* KPNs (with local channels) are sufficient to qualify for your tier 3. Because it allows us to name patterns of processes and wire them together, then name this pattern of patterns for further use - that's how you define tier 3.

Why do you believe recursion or spawning is even relevant?

hierarchical and cyclic KPNs

I do not undestand what do you mean under hierarchical KPN. Any examples in articles?

Hierarchical KPN might be where we define a node, and this node has own substructure names local to this subnone and local connections. And this needs to happen within notation. Like nested blocks in VHDL. I do not see these things in KPN papers that I have looked at. Original Khan's paper also does not discuss it. There might be a dialect with hierarchical KPN suport, but it should be considered separately, like FORTRAN 66 is a tier 2 langauge, FORTRAN 77 is a tier 3 langauge, and FORTRAN 2003 is a tier 4 language.

Cyclic KPNs do not increase an abstraction level as we have a flat namespace of nodes and these nodes are connected within pattern. It is just like the following BASIC program is cyclic, but it is cyclic with the tier 2 notation as there is a flat mapping from line number to actions and oprations refer to this mapping ("next" operator uses variable name as name for "for" operator line) (the sample is taken from wikipedia):

5 LET S = 0
10 MAT INPUT V 
20 LET N = NUM 
30 IF N = 0 THEN 99 
40 FOR I = 1 TO N 
45 LET S = S + V(I) 
50 NEXT I 
60 PRINT S/N 
70 GO TO 5 
99 END

Hierarchical KPNs

To clarify, Kahn Process Networks are not a 'notation' by themselves. They instead define a set of constraints on communication between processes to guarantee a deterministic outcome while supporting parallel computation and incremental interaction. It's these properties that make KPNs great for high-performance computing in context of a purely functional language.

However, there are many notations that are a great fit for KPNs, such as box-and-wire diagrams in component-based programming. Consider John Paul Morrison's Flow Based Programming (FBP), which has a large number of graphical implementations and a reasonably large following. We could easily develop KPNs using a subset of FBP notation (e.g. forbidding non-deterministic merge of output wires).

Hierarchical KPNs are what we get when we take "open" KPNs (a KPN with some unbound input or output channels), give them a name, and allow them to be be reused as component processes in a parent KPN. Thus, a KPN is just another process from the perspective of a parent KPN.

So, the common notations we'd use for KPNs in practice today are tier 3. Same as FBP.

Where is hierarchical KPN described?

The tier model is applicable to notations and how abstractions in it play together. If something is not a notation, the model should be applied with care.

The graphical notation in original Khan article, dissertation, and other refrenced article are tier 2. And I have not found resuse of graph that you are describing with quick scan of the original KPN. It has only operation of joinig KPNs together. So, could you point to articles that confirm such usage KPN. Or point to part of original Khan article that describes such usage.

I have feeling that you are extending KPN model with additional mechanisms, but might be some other notation like "structured KPN". That would be different from original KPN in the same ways as FORTRAN 77 is different from FORTRAN 66. The tier evaluation is related to the specific version of the notation, any additional extensions could change evalution.

Subset of something does not necessary share the same abstraction tier as original notation as relveant parts might be not included into the subset.

Flow-based programming also refers to many different langauges with different features. Graphical samples are the tier 2 notations visually, there might be the tier 3 semantics attached to nodes, but it is not obvious from diagrams. Please specify as specific language and specific version that you mean, and only than we could discuss the tier.

Hierarchical KPNs

The wikipedia article on KPNs already defines 'open' KPNs. Hierarchical KPNs are what happen when you cut a circle out of a KPN and give the resulting open KPN a name for reuse. The correctness of doing so is very nearly 'trivial' - we can prove equivalence by inlining a component KPN back into the original diagram (and perhaps some alpha renaming of internal channels).

It's a trivial and obvious extension, so I've never even thought to look for old articles on the subject. For your warning about 'applied with care' - yeah, done that.

In any case, when I spake of KPNs earlier, I'm indeed talking about the hierarchical structured forms where we can name KPN patterns and use them as components for larger KPNs, which is sufficient to reach tier 3 (because we now use named patterns as nodes within our patterns).

Graphical samples are the tier 2 notations visually, there might be the tier 3 semantics attached to nodes, but it is not obvious from diagrams.

FBP is at least tier 3. A central tenet of FBP is developing reusable 'components' that are essentially black-boxes referenced by name. But it's hierarchical nodes included by name, so tier 3, instead of patterns by contract, which would be tier 4. According to that first article, at least.

Hierarchical KPNs seem to be extension of model

So, you are using some extended notation that is not described in the articles. The Khan's notation is a simple graph. Addition of the tier 3 will require naming graphs, creating copy of graph (possibly with node rename) on use, and than connecting ends of the copy to ends of the node. While it looks simple and intuitive, the logic is not so trivial, and there is no traces of this logic in Khan's artricle.

As I understand Khan article, the implied reuse model was connecting inputs of one graph to output of another graph without creating copies (connecting boundaries). Your reuse model is invoking copies and connecting them in place (connecting from inside with substitution).

I'm not sure what reuse model different notations of FBP family use, as both cases are reuse, but reuses of the different tier. Maybe different languages use different models.

Re: model in mind and model in the code

Of course. But C++ has difficulty representing 'containment' of recursive data structures. Whereas, in purely functional languages, a type like `type Tree a = Node a (List of Tree a)` only ever represents a containment relationship. It could be implemented with pointers for performance, or as a giant string just to make a point that pointers aren't required.

This is relationship by reference in the implementation. Because only “by reference” representation allows achieve acceptable performance. The fact of by reference or even lazy semantics is widely used in implementation of the collection libraries. And this could not be changed without changing fundamental performance characteristics of the language. Lage parts of Haskell code will stop to work if change lazy to eager evaluation due to usage of infinite lazy structures.

Working with graphs in a language without mutation is not particularly difficult. Matrices are awkward for some problems, but I did mention three other approaches. Further, if we admit references but exclude mutation, we can add adjacency lists, relational tables, and other ways of representing graphs that utilize labeled vertices.

All these ways add a need for additional translation of models in the mind just like a matrix representation, thus creating additional cognitive load. And also, these ways have poor support for non-uniform graphs.

I'd still consider those to be pointers or references. A pointer or reference is a first-class value that we use to index an external resource. We can speak of nice properties for a reference - self describing, strong typing, self-validating, multi-homing, cryptographically unforgeable, etc.. But a reference in the absence of nice properties would still be a reference with respect to computation.

Pointer in the language and pointer in the mind are different things. On assembly or machine code layer we have numbers. We have typed pointers on the language layer in the tier 3 and above languages. Integers on machine code layer are implementation of pointer. Machine code just have indexed memory access rather than references. This is a different logical concept. The pointers as typed object were invented only in the year 1964.

No, you're clearly missing my point. I'm saying the language's *compiler* could implement the tree as a string. It wouldn't be efficient. Indeed, it'd be utterly pointless EXCEPT for this one specific point: "yes, we can implement trees without using pointers, suck on that const". We wouldn't lose ANY of those nice services you just mentioned. Construction and manipulation of the tree would still be strongly type-safe, for example. Indeed, other than the crawling slowness and extra warmth from the CPU, the programmer wouldn't even notice that the tree is implemented as a string.

Now to complete the thing you need to implement a string without usage of pointers or indexed memory access. Also, the string interpreter itself could not use pointers or indexed memory access for support structures.

You said, "I also have a trouble of imagining how it could be implemented without pointers". I was telling you how it could be implemented without pointers. That is the point. Your lack of imagination. . . disturbs me.

Yea. I lacked imagination that someone consider implementing things not my moving close to the hardware, but possibly moving further from it. Actual implementation techniques use pointers. To implement them on realistic hardware one to have to use pointers, on go to evet to lower level and use addressed memory access.

I wouldn't start with the lambda calculus. But we can develop a low-level set of purely functional concatenative combinators that operate on a stack similar to a Forth machine - a purely functional abstract machine code, of sorts. We can build upon this monadic programming or algebraic effects to support effectful programming, if we wish.

For performance, I know of two routes. Either we have an expanding set of 'primitive' data types and combinators that can leverage hardware efficiently (like a matrix multiply operator to leverage SIMD), or we can develop software accelerators - basically, compiler-intrinsic functions with a reference implementation - so we can add 'performance primitives' to the language without adding new 'semantic primitive' combinators. Sort of a CISC vs RISC philosophy.

There are several languages in this vein: Joy, Cat, Kitten, my Awelon. It wouldn't be difficult to adapt J or K languages to a purely functional foundation, either.

Forth uses stack as shared mutable data structure. Selective check with Joy and Kitten shown that they “dup” operator as well, so they rely on mutable and explicitly accessed data structure - stack.

As a proponent of functional programming: We call them effects. We think effects are both important and troublesome, so we handle them with care: effects are separated from logic where feasible, and typed where used. We do shun "side effects" - effects that are hidden from the client of an API. But that doesn't mean we ignore "intended effects".
This proponent who always calls intended effects 'side effects' is misinformed about FP, is lying to you, or is a straw man you fabricated with intention to burn.

You have conveniently added word “always” to the context. But let’s look to the documentation that is addressed to imperative programmers at this link. I think it is representative, as it is specifically discussing this issue from point of view of Haskell programming. All usages of word effect on this page are either “side-effect” or “side effect”. So, people that consistently use the words “side effect” in the context of at least one article do exists.

I’ve seen some papers that discusses word effect in other way, but they do not seem to be representative. Even Wikipedia article insists on side effect as main name of the term, and in the article about referential transparency as well. So even if these articles are incorrect, none has bothered to change them, so they seem to be acceptable for community.

Because only “by

Because only “by reference” representation allows achieve acceptable performance.

You're terrible at this whole 'logic' thing, aren't you? Since you're trying this fruitless direction, you next need to prove the impossible: that abstraction tiers have performance requirements.

AFAICT, your abstraction tiers have no hard requirement for performance, popularity, cognitive load, or even utility. So 99.9% of what you've tried to argue (in earlier posts, too) seems irrelevant to specific questions like "do tier 3 languages need pointers"?

they “dup” operator as well, so they rely on mutable and explicitly accessed data structure - stack.

Dup is purely functional.

dup : (s * a) -> ((s * a) * a)

Fact is, we aren't sharing the stack, so it isn't shared state. It can be understood as a linear value.

You have conveniently added word “always” to the context.

Nope. I only emphasized it. You said, "proponents of functional programming always call these useful intended effects “side effects”".

I linked an article on algebraic effects, earlier. It is, today, the most popular approach to effects in new FP languages.

What is your point?

AFAICT, your abstraction tiers have no hard requirement for performance, popularity, cognitive load, or even utility. So 99.9% of what you've tried to argue seems irrelevant.

Yea. Discussion has moved from abstraction tiers discussion long ago, and degraded to OO vs FP discussion where we obviously disagree on many points. This is discussion is not relevant to the topic indeed.

Abstraction tiers are just about abstractions, my main argument is that notations evolve along the same line, and I have not seen clear arguments about that point. And this is closely related to issue how humans interact with notations, so cogntive load is essential to the discussion. With each tier transion the structure of cogntive load changes, and this allow humans to manage a greater amount of elements.

Notations are not for comptuers, notations are for humans to interact with computer. IMHO the best notation is one that allows to get things done in most strainghforward way close to mental models. It looks like we disagree on this point as well, as you seems to value other aspects of notation more.

I do not actually undestand the point that you are trying to make, as I have only seen discussing some small details and random language references that you claim as contradicting the model somehow without saying explicitly what contradition was. Please make your core point(s) clear.

You seems to argue that pointers/references are not essential for the high-tier notations that work with data. However, even Haskell has references and they are essential to its semantics. The argument the values are passed by containment in pure FP is not true from point of view of Haskell, as they are passed by lazy reference.

Mutable/immutable issue is irrelevant to abstraction tiers, except for I have not seen a general-purpose programming language with only immutable structures below the tier 4.

Functional languages (ML, Haskell, LISP) start at the tier 4 at least for behaviour aspect, because lambda abstractions belong to this tier. As such, I do not think that they are a good basic layer describe computing systems and their evolution, while they could be useful for proofs due to Turing-equivalence. You seems to disagree on this point as well. But it does not look strongly relevant to discussion of abstraction tiers.

Dup is purely functional.

dup : (s * a) -> ((s * a) * a)

From cursory glance, the dup operation in these languages does take an explicit argument, it takes implicit stack pointer from context and saves modified stack pointer back to the context from notation point of view. The stack is shared between operations. I have not worked with these languages, but I had some experince with Forth.

In contrast, the haskell does not share stack between operations, the operations do not need to be aware of the stack state at all, and in case CPS-rewirte phisical stack even could be avoided. So there is no mutable state in notation until we start monad-based DSL that allows us to do imperative programming.

So it does not seem to have a signature that you are poposing from usage point of view.

Abstraction tiers are just

Abstraction tiers are just about abstractions

Indeed.

main argument is that notations evolve along the same line

Well, that's perhaps the main point of your overall article.

But refer to your first clause: "abstraction tiers are just about abstractions".

Therefore, abstraction tiers are NOT about the evolution of notation. They are NOT about cognitive load. They are NOT about performance. Further, based on how you describe them, they are NOT about specific abstractions such as pointers or mutation.

You seems to argue that pointers/references are not essential for the high-tier notations that work with data.

Yes, that's what I'm claiming. I won't deny they're sometimes convenient, though.

I have not seen a general-purpose programming language with only immutable structures below the tier 4.

Notations for combinatory logic languages (like SK combinators) would qualify. We could build a functional language above them. Without the 'variables' of lambda calculus, there's no convenient way to "substitute" a hierarchical node (in the notation) with something that will fulfill the contract. Yet, we still have all that Turing powerful computation and immutability.

Lambda calculus can desugar to SK combinators. So, it isn't difficult to lift combinatory logic to tier 4 via projection or syntactic sugar. But it's not there by itself.

From cursory glance, the dup operation in these languages does take an explicit argument

You have it wrong. `(s * a)` is the type of the *implicit* stack argument. The `dup` operator takes as input the data stack with at least one item, then produces a stack with the same item cloned, hence output type `((s * a) * a)`. I find it convenient to represent stack types as tuples like this.

In contrast, the haskell does not share stack between operations, the operations do not need to be aware of the stack state at all

You seem to be confusing Haskell's call-return stack with a stack programming language's data stack. Most stack programming language implementations also use separate call-return stacks.

SK combinators

I have not seen a general-purpose programming language with only immutable structures below the tier 4.

Notations for combinatory logic languages (like SK combinators) would qualify, and we could easily build a language above them. Without the 'variables' of lambda calculus, there's no convenient way to "substitute" a hierarchical node (in the notation) with something that will fulfill the contract. Yet, we still have all that Turing powerful computation and immutability.

Lambda calculus can desugar to SK combinators. So, it isn't difficult to lift combinatory logic to tier 4 via projection or syntactic sugar. But it's not there by itself.

You have missed part a general-purpose programming language. SK combinators are not a general-purpose programming language. It is mathematical model. Becasue of Unlambda, I should have explicitly added "actually used by humans in practice", since it is possible to create a toy language with arbitrary features. But Unlambda purity is under suspicion, since it has IO functios in basis.

The translation of one notation to other notation generally says nothing about their abstraction tiers. After all, everything could be compiled to Turing Machine that is the tier 2 and back.

Also, everything above machine code is just "a syntax sugar". The programming languages are needed for humans to manage increasing behavior complexity. And the higher the tier of the abstraction, the more complex behavior they allow to organize into manageable form.

You have it wrong. `(s * a)` is the type of the *implicit* stack argument. The `dup` operator takes as input the data stack with at least one item, then produces a stack with the same item cloned, hence output type `((s * a) * a)`. I find it convenient to represent stack types as tuples like this.

The execution of dup operator involves retrieving stack pointer and than saving it back in a modified form. The internal code of dup might be immutable, but its excution logically is not if we consider environment where it is executed. While this stack register is implicit, the knowledge about it is critical for semantics of the operations. It might be possible to hide in monad like way and even type it. But from my experience in Forth, soon one need operations dupNth and so on, that are even harder to type and would require more interesting ways to work with stack.

SK combinators are not a

SK combinators are not a general-purpose programming language

Lambda calculus, by itself, is also not usable as a general purpose programming language. You at least need to support libraries of user-defined named functions before it's usable. You might also add shorthand for numbers, lists, strings, etc. so data input is tolerable. You might further need compiler support to integrate with the effectful environment (like Haskell's IO monad).

Yet you have repeatedly shortened this to just "lambda calculus". IMO, if you don't need to fully describe the obvious and necessary extensions to lambda calculus, then I shouldn't need to mention the same for SK or KPN or any other simple base model.

There in fact have been general purpose languages built upon simple combinator bases. My own Awelon language would be an example. But these do receive same treatment as lambda calculus before use as a general purpose language: libraries of named functions, shorthand for concise data input, etc.

everything above machine code is just "a syntax sugar"

I define syntactic sugar more rigorously than that. Syntactic sugar must be modeled by a syntactically local transform.

Thus, a lot of language features - packages and modules, multi-methods, typeclasses, type-based overloading of function names, etc. - are not syntactic sugars because they involve remote elements. Syntactic sugar is also limited to output valid syntax. Thus, output of machine code is only possible for languages with features for embedding machine code.

Or put another way, syntactic sugar has essentially the same limits on extent and features as text preprocessor macros.

execution of dup operator involves retrieving stack pointer and than saving it back in a modified form. The internal code of dup might be immutable, but its excution logically is not if we consider environment where it is executed.

If we consider execution environment, even lambda calculus uses no immutable data structures.

But that's not what 'mutability' as a semantic concern is about.

The following my statement

The following my statement was specifically about general purpose langauges used by humans:

I have not seen a general-purpose programming language with only immutable structures below the tier 4.

So it is not applicable to SK and lamda calculus which are mathematical notations. The notion of tiers is appliable to SK and lambda calculus, but the statement above is not.

It is also not applicable to the toy languages, because when humans use the language on the large applications they have to balance features for usuability. Toy languages do not have a need to balance.

dup

dup

(Entities, Species, and Genera) * (Concrete, Abstract)

This quote from Stepanov's "Elements of Programming" seems relevant:


In order to explain what objects, types, and other foundational computer notions are, it is useful to give an overview of some categories of ideas that correspond to these notions.

An abstract entity is an individual thing that is eternal and unchangeable, while a concrete entity is an individual thing that comes into and out of existence in space and time. An attribute—a correspondence between a concrete entity and an abstract entity—describes some property, measurement, or quality of the concrete entity. Identity, a primitive notion of our perception of reality, determines the sameness of a thing changing over time. Attributes of a concrete entity can change without affecting its identity. A snapshot of a concrete entity is a complete collection of its attributes at a particular point in time. Concrete entities are not only physical entities but also legal, financial, or political entities. Blue and 13 are examples of abstract entities. Socrates and the United States of America are examples of concrete entities. The color of Socrates' eyes and the number of U.S. states are examples of attributes.

An abstract species describes common properties of essentially equivalent abstract entities. Examples of abstract species are natural number and color. A concrete species describes the set of attributes of essentially equivalent concrete entities. Examples of concrete species are man and U.S. state.

A function is a rule that associates one or more abstract entities, called arguments, from corresponding species with an abstract entity, called the result, from another species. Examples of functions are the successor function, which associates each natural number with the one that immediately follows it, and the function that associates with two colors the result of blending them.

An abstract genus describes different abstract species that are similar in some respect. Examples of abstract genera are number and binary operator. A concrete genus describes different concrete species similar in some respect. Examples of concrete genera are mammal and biped.

An entity belongs to a single species, which provides the rules for its construction or existence. An entity can belong to several genera, each of which describes certain properties.

We show later in the chapter that objects and values represent entities, types represent species, and concepts represent genera.

So here we have on tier zero the concrete instead of chaos, and we have concrete entities, species and genera.

Then on tier one we have abstract entities, species and genera.

Parsing of the text

Here is some parsing of the text with repect of the tiers.

An abstract entity is an individual thing that is eternal and unchangeable, while a concrete entity is an individual thing that comes into and out of existence in space and time.

Tier 1. Discusses individual entities

An attribute—a correspondence between a concrete entity and an abstract entity—describes some property, measurement, or quality of the concrete entity.

Tier 2. Discusses flat map from names to values

Identity, a primitive notion of our perception of reality, determines the sameness of a thing changing over time.

Tier 1. Discusses individual entities

Attributes of a concrete entity can change without affecting its identity. A snapshot of a concrete entity is a complete collection of its attributes at a particular point in time.

Tier 2. Discusses flat map from names to values

Concrete entities are not only physical entities but also legal, financial, or political entities. Blue and 13 are examples of abstract entities. Socrates and the United States of America are examples of concrete entities.

Tier 1. Discusses individual entities

The color of Socrates' eyes and the number of U.S. states are examples of attributes.

Tier 2. Discusses flat map from names to values

An abstract species describes common properties of essentially equivalent abstract entities. Examples of abstract species are natural number and color. A concrete species describes the set of attributes of essentially equivalent concrete entities. Examples of concrete species are man and U.S. state.

Tier 3-. Discusses nomital types, but there is no discussion of entity references. The types are supposed to be applied ot attributes, but possibly this is too obvious to the author, so it is not discussed.

A function is a rule that associates one or more abstract entities, called arguments, from corresponding species with an abstract entity, called the result, from another species. Examples of functions are the successor function, which associates each natural number with the one that immediately follows it, and the function that associates with two colors the result of blending them.

Tier 2. Discusses functions but there is no discussion that functions could call other functions, like FORTRAN 66 did not allow the recursion.

An abstract genus describes different abstract species that are similar in some respect. Examples of abstract genera are number and binary operator. A concrete genus describes different concrete species similar in some respect. Examples of concrete genera are mammal and biped.

An entity belongs to a single species, which provides the rules for its construction or existence. An entity can belong to several genera, each of which describes certain properties.

Tier 4. The discussion between abstract and concrete here, so there is some notion of black box is introduced.

We show later in the chapter that objects and values represent entities, types represent species, and concepts represent genera.

Tier 1, 2, 3-, 4. The discussion between abstract and concrete here, so there is some notion of black box is introduced (genera).

Generally discussion is restricted to the tier 1,2, and 4 concept, while underusing the tier 3 concepts. The tier 2 concepts is what is close to the iron, and the tier 4 is current mainsream of the langauges. The tier 3 concept are so natural to mind already that they are barely noticed. This is quite strange for the discussion, as structured programming was a major advancement in understanding what is a program. It was a real revolution.

removed

misplaced