Streaming Language Rewrite Processing (SLRP)

Streaming Language Rewrite Processing (SLRP) is a low-level computing model that is essentially characterized by limited-memory processors rewriting a much larger input stream. Computation under this condition requires multiple passes, but is highly amenable to pipelining and partitioning, and thus parallelism. SLRP processors may have side channels to external devices that we drive through ‘effectful’ rewrite rules.

TLDR: SLRP is a very promising alternative to the von Neumann architecture. SLRP offers advantages for security, concurrency, distribution, and scalability. However, SLRP performs poorly for data plumbing and loops because there is no pass-by-reference within the stream. This weakness can be mitigated by effects, shunting a region of the stream to a remote device and keeping a small placeholder – a reference.

The article develops a Turing complete foundation for a SLRP machine code based on concatenative combinatory logic, and sketches several extensions. The challenge is that the operands for a combinator may be larger than a processor's memory, e.g. for a copy operator `c[X] == [X][X]` the value `[X]` might be larger than what our processor can remember. Thus, a processor cannot simply copy and paste. To solve this involves divide-and-conquer tactics, and a notion of open-ended blocks.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

-

-

mechanics of reduction

I added a 'background' section in response to your comment (which seems to be gone, now).

Regarding open-ended blocks and `\`:

I introduce `\` because I want the behavior `o[A][B] == [AB]` but, mechanically, our processor cannot necessarily fit all of the first operand into its limited memory. We can only start printing `[A` then, when we get to the end, figure out what to do next. If we have `][B...` that's great, just don't write the `][` to output and we're done. If not, we can only leave a mark in our stream so we can continue composition on a later pass. `\` is exactly that mark.

However, we have not lost the equivalence: `[A\` is just another way of writing `o[A]`. Thus, `\` is still a block delimiter, both formally and mechanically.

for a PL forum LtU is really poorly written

-

How do you insert "/"?

I don't really understand this example. I'm trying to use a mental model of a fixed size window sliding over the code, but I don't see how the usage of `\` makes sense in this context.

If not, we can only leave a mark in our stream so we can continue composition on a later pass. `\` is exactly that mark.

If it's not `][`, then it's just more code before the `]`. How is it correct to insert a `\` in that situation?

Let's say the code we want to rewrite looks like `o[A1 A2][B]`, so that the end result we'd like is `[A1 A2 B]`. However, our fixed size processor can only hold `o[A1` in its memory.

So the processor chugs along, reads `o`, reads `[`, reads `A1` and then stops. I'm confused as to what happens here: there isn't going to be a matching `[` bracket here since we're in the middle of a block, so if we just write `\` now, then the stream is `[A1\A2][B]` which seems wrong.

I don't really understand how `\` ever gets in to the stream. If you're streaming a really big argument, it's gonna look like `[A...`, which means the "rest" of the code is `...B]`. Where are the `\` coming from? For example there are two rules like

d[A] ==
d[A\ == d

But if a processor is streaming code, won't it just see `d[A...`? Where did that `\` come from?

Finite state and bounded block depth

The processor has finite state. A finite sliding window is part of this, but the processor can also enter a state after seeing `o[` where it scans to the end of the block - simply outputting the block, not keeping it in processor memory - then rewriting the final `]` to `\` (or rewriting a final `\` to `\o`, as the case may be).

Of course, this tactic requires that we can identify the end of the block. A convenient and practical mechanism: we assume block depth is bounded by some absurdly large value, e.g. less than 2^64, such that we can use a fixed width counter to increment on `[` and decrement on `]` or `\`. When the counter decrements to zero, we're at the end of the block.

Streaming Language

I am interested in the topic, however, I got lost at the first line.

a[A][B] == [B]A     (apply)

If it is "apply" shouldn't it be A[B]?

The next line has the notation in the order I'd expect:

b[A][B] == [A[B]]   (bind)

Did I misunderstand something? Is the program in Polish reverse notation or no?

Then how to read the more complex rules such as:

c[XY    == oa[a[o]w]c[X]c[Y

re apply

What you're looking for is the 'inline' operator: `i[A][B]` will give you `A[B]`. More generally, `i[A] == A`. We could define inline from the initial four combinators as `i = daa[[]]`. However, in SLRP we end up wanting `i` to be primitive.

The 'apply' operator from Awelon corresponds to "dip" in Joy (or Factor). But 'dip' doesn't fit my nice `a b c d` mnemonic, and I viscerally dislike the word almost as much as 'moist'. Anyhow, apply as defined is quite useful in practice, and has a nice symmetry with 'bind' (bind a function to first argument vs. apply a function to all but first argument).

Anyhow, apply is correct as defined. And yes, it's reverse Polish notation.

Assembly not machine code?

This looks like an assembly language, as surely a machine code would be a bitstream encoding?

machine code, not assembly

SLRP is an architecture (essentially, SLRP is stream rewriting in multiple passes by fixed memory processors). Like the von Neumann architecture, SLRP is not specific to any instruction set.

The machine code in the article is indeed a code we could directly implement in a physical processor. But it's also a didactic proof of concept. The design goal was not performance, but rather to explain the SLRP architecture, to prove Turing completeness, to demonstrate generality of the model, and (for anyone interested) to provide a viable starting point to discuss and design a more performant machine code.

So, you could encode the instruction set as a bitstream encoding if you choose to develop SLRP processors.

That said, I do wonder whether simply adding a lightweight stream-compression module to the processor (decompress on input, compress on output) might serve us better than attempting a Huffman encoding or whatever at the operator level. It's not something we could easily do in a language with jumps, but with stream processors the idea of hardware compression could work quite well.

Silicon Area

The area of silicon you use for compression would reduce the number of cores you can fit on a wafer, so has a direct cost implication for the production. This all of course presupposes a CISC architecture. In a RISC architecture the bits in the bitstream (instruction set) tend to correspond directly to the internal control lines of the processor, which saves on instruction decode logic, at the expense of instruction density.

The best option here might be RISC op-codes that get stream compressed, so the processor decompresses the incoming stream into an internal window of RISC op-codes, that directly switch the reduction logic.

re Silicon Area

There's definitely some tradeoffs involved. It's been a few years since I read much about compression. I vaguely remember a blog article on an adaptive bit-stream encoder that used only a very small window and small amount of state. Its effectiveness wasn't super great, but not bad either.

At first I thought it was finite-state entropy but upon review that doesn't seem to be the case. I'm pretty sure it was a finite-state adaptive arithmetic coding.

Anyhow, I'm pretty sure we can develop a compression algorithm that has a small size on die and reasonable effectiveness for most input.

IIUC, the cost of moving code and data from memory to CPU is quite significant and is a big part of what motivates sophisticated logic like pipelining processor instructions. So if we can reduce costs via compression, we might accept simpler logic elsewhere. Basically what you're saying with compressed-RISC streams.

In any case, this isn't a domain that interests me too much, so I'll leave it to those passionate on the subject. There are entire forums for people who are into compression.

Copy, and Code with Data

The hardest part seems to be the divide and conquer for long arguments. In the case where we want to duplicate or find the fixed-point of a large function for example. Am I right in saying that this would take several passes, dependent on the 'window' size of the processor?

It seems to me in every case you would be better off with a reference/pointer, and so although you could theoretically design a processor like this, it would be an academic exercise, and you would be better off designing a streaming processor that could use references for actual use (you mention some ways of doing this in the article). Further for most cases combining code and data is not that useful, as it reduces the available stream bandwidth for data. Consider streaming network TV, or NASA telemetry data as typical uses of streaming, sending the code with the data doesn't make sense.

What is the use case where sending the code with the data is a better option?

For more local use, combining code and data could lead to security issues. Where code is separate from data, we can validate that code, and have assurances that it will not do certain things. We are then safe to use data from any origin. Having said that JavaScript in web pages is probably a good example of widespread code and data combined. There would probably need to be some kind of sandboxing to make this work for general purpose (desktop) computing.

It is not clear to me how real-time IO would work, when the operating system may need several passes through the processor. I guess you would have to update the display once per pass? Maybe my brain just doesn't work right for this kind of design, but it seems like it would be hard to implement even simple UI elements.

re Copying

Am I right in saying that this would take several passes, dependent on the 'window' size of the processor?

Yes.

It seems to me in every case you would be better off with a reference/pointer

It's certainly more practical. That's why there's a section in the article on 'virtual memory' for SLRP, which replaces a region of the stream by a reference to the region. For practical performance, that's a very useful extension. (I call this 'stowage' in Awelon.)

(I'm separating the other title issue.)

re Code with Data

There are many use-cases for combining code and data: templating, procedural generation of data, representing infinite streams or trees as first-class values, adaptive structure where data can automatically integrate with its context, introducing spreadsheet-like properties to a filesystem, mobile software agents that can move to a remote processor and implement local decisions/filters/compression to reduce latencies and network overheads.

Bandwidth isn't a significant concern. Even within a process stream, binary data with sized headers can have negligible overheads. This depends on chunk size, but working with a few kilobytes at a time is no problem (a typical L1 cache on a processor today is tens of kilobytes).

Further, you seem to be assuming that code and data must be combined for external IO. In SLRP, our processors would typically interact with network or human-interface devices through side-channels - see the 'Effectful SLRP' section with the `![ChannelID][Request] => [Response]` operator. In most use cases, the channel ID, request, and response would all be embedded binary data. A streaming process that interacts effectfully with a network is not the same as receiving arbitrary streams of code from the network.

Regarding security:

The von Neumann architecture has a lot of difficulty with security mostly because 'addresses' are forgeable. This can easily be fixed using capability-based security. Unfortunately, use of addressing is extremely frequent and fine-grained in von Neumann systems, so the relative space-time overheads for securing addresses this are too much. Instead, we resort to ad-hoc, coarse-grained security models such as page tables and protection rings.

A lesser issue is that it's very difficult to control what a user can do with an address, to protect system invariants.

However, this awkward security experience with von Neumann architecture doesn't generalize to other mixed code and data designs. SLRP works well with capability-based security. Virtual memory or channel IDs only require coarse-grained references. There's no pointer arithmetic. Both request-response channels and virtual memory references are structurally restricted in what we can do with them, so we can protect invariants. Overall, the cost of securing references at the hardware level is quite acceptable in SLRP.

A secondary security benefit for SLRP relates to rewriting semantics: there is no undefined behavior. When the processor sees a divide-by-zero or a type error or a request it cannot handle, it simply writes that code to output and moves on. Consequently, there is no need for external fault handlers and awkward non-local reasoning.

Regarding real-time processing and IO:

In SLRP, we naturally must control the stream size for real-time or embedded systems processing.

An intriguing feature of SLRP is that, beyond pipelining the stream and passive virtual-memory storage, we can also leverage the idea of virtual memory to partition an 'active' stream across multiple processors. This would allow us to compute each partition in its own tight loop, and stitch the streams back together when the result is required.

With clever partitioning schemes, we essentially support multi-threaded code within a stream. If we have more partitions than processors, we could feasibly time-share partitions on a processor. In case of heterogeneous systems (e.g. different processors control different devices) we can also migrate the partition to the processor that can handle a request. Thus, if we partition a specific request we would simulate remote procedure calls, and if we partition a loop we would simulate a mobile agent. Thus, SLRP plus virtual memory has convenient, intrinsic support for concurrency and distributed computing even without use of explicit network requests.

With that in mind, for real-time IO we don't necessarily need to control the entire stream. It's sufficient to control the size of specific remote or time-critical active partitions (threads) of a large stream. This requires precise, effectful control over partitioning, but that's entirely feasible.

Loops and messages

I was thinking more about how APIs would work where something has priviledge, but only exposes a limited set of capabilities to the “outside”. Consider a file system with ownerships and permissions. The “driver” has complete low level access to the device. This would exist I assume as a kind of “tape loop”, the code gets repeatedly passed over and over again. For efficiency it would need to be “paused” at a rendezvous with another steam as we previously discussed. The second stream would then be a message, and the driver stream would check permissions, perform the actions and reply with another message.

So to make this work the processor would need to be able to have two input streams and two output streams at the same time. It would also need to be able to context switch. One of the input streams would have to be limited in some way so that it only serves as data to the other stream, otherwise it could take full priviledges and break the security. I don’t see how this can work with code and data combined? Code and data together creates a symmetry, whereas an API requires asymmetry, one side having priviledge the other not. If there is any way to pass code through the API we lose control of security.

The other issue here is context switching, how could it context switch without a stateful, memory address / pointer based approach? You can’t interleave the code easily without exceeding the window size.

re Loops and messages

If I understand correctly, you're essentially asking about how we model 'applications' separately from the 'OS' process. In SLRP, we wouldn't need to separate these things in the conventional sense. Apps can be directly embedded as block-values in the OS. Consider two approaches:

First option relies on facet pattern. We inject capabilities as an argument to our app - basically, instead of a `void main()` app model, we have `void main(caps)`. The injected caps can be restricted via facet pattern. Instead of a raw `[![BlockDeviceID]]` we can give the app some 'methods' that wrap the block device and restrict access.

Second option is based on monads or algebraic-effects. Our app evaluates to some form of request-continuation model. The OS wraps this monadic program with an effects handler that will interpret each request, provide a response, and continue. The important difference is that the `![BlockDeviceID]` is never injected into the application code, which results in improved control by the OS at some cost to performance.

Apps might be loaded from a block device or network. If from a distrusted source the app code might be sanitized or compiled by the OS before it is loaded. For scalable systems and performance, we could leverage the 'active' virtual memory partitions that I described above. This would allow each partition to compute in its own 'loop' if we have enough processors that time-sharing is not required.

Note: I'm not using any synchronous rendezvous here. No pausing streams. No assumption of multiple streams. I'm assuming only the original SLRP as described in the article, including side-channel effects and virtual memory references.

One of the input streams would have to be limited in some way so that it only serves as data to the other stream, otherwise it could take full priviledges and break the security.

No. We're not using protection rings with 'privileges'. Capability-based security and protection rings/privileges are diametrically opposed. It's easy to blend code with different authority in a capability-based system.

The question to ask for capability-based security is whether we can forge capabilities. If we can't, then there's no risk of takeover. Protecting against forgery isn't difficult. One option is to sanitize apps, perhaps forbid direct use of the `!` operator so app is pure until it receives caps from the OS. Another option is to secure the resource identifiers from forgery, e.g. add a random 64-bit entropy suffix to channel IDs so that guessing at caps is infeasible.

The other issue here is context switching, how could it context switch without a stateful, memory address / pointer based approach? You can’t interleave the code easily without exceeding the window size.

We don't need context switching, or at least not a conventional notion of context switching. In SLRP, there's no protection ring context, no page table context, no register context. And we don't need to interleave code at operator-level granularity. At most, we might need to time-share a processor for several streams, but that isn't so different from working with one larger stream of form `[Stream1][Stream2][Stream3]...`.

Also, the processor sliding window (via output buffering) is simply an optimization, to focus processor attention on parts of the stream that need more work. In practice, there'd be a quota in case of tight infinite loops (like `z[i][]`) so we only cycle so many times before forcibly moving on. So time shares will always predictably progress.

Security and Priority.

First option relies on facet pattern. We inject capabilities as an argument to our app - basically, instead of a `void main()` app model, we have `void main(caps)`. The injected caps can be restricted via facet pattern. Instead of a raw `[![BlockDeviceID]]` we can give the app some 'methods' that wrap the block device and restrict access.

Yes that makes sense, I understand the capability model as opposed to ambient authority. This processor is going to be very complex compared to the traditional, as it will need to provide capabilities to enumerate devices etc in hardware. Effectively the BIOS would need to be hard coded into the CPU, unless you had a "normal" von-neumann CPU running a micro-code program to do device enumeration and manage capabilities.

This architecture seems to have a "bootstrap" problem. Things like capabilities are traditionally managed by the OS kernel because they are too complex to implement in hardware. We want to minimise the complexity of the hardware because a "bug" in a mask set can cost $100,000+ to put right. The von-neumann architecture can be implemented in tiny chips like the 6502. Without it we would have never had computers with which to develop more sophisticated algorithms, so perhaps my "fundamental" comments could be justified after all? But to be fair the 6502 had no memory protection, and tended to be used in systems without multi-tasking. Even so it seems to merge system architecture and OS function with CPU architecture, which would seem problematic.

Also, the processor sliding window (via output buffering) is simply an optimization

I don't see this - you need to see the operator and it's arguments to do a reduction. If we have arbitrarily long arguments we need to split them into chunks, which means holding the operator whilst streaming the arguments. With multiple interleaved streams we need to hold an arbitrary number of operators and context switching them, which probably requires spilling to a stack.

This also only seems to work for equal proportion timesharing, there is no easy way to have dynamic prioritisation where the priority of a process changes with time. It's seems like it would be difficult to have a usable desktop system.

re hardware caps and sliding windows

Hardware caps aren't nearly as complex as you make them out to be. Mostly, it just requires a few extra entropy bits to interact with resources. Or tag bits, if we have a machine code that can prevent forgery of references. Sure, there'd be a few $100k hiccups when transitioning to the model. But I don't see how it would be a problem long-term.

Bootstrapping also isn't particularly difficult. We could still have a separate BIOS or firmware that injects an initial stream and gets the cycle started. We would provide some capability arguments to the OS like the OS would do for any other application, e.g. a simple binary cap to query and discover other local hardware caps.

SLRP processors only need to know the side-channels they're physically wired to so they can filter which requests they handle vs those which they ignore and output unmodified. This is useful for heterogeneous systems, where not all processors are attached to the same devices. Processors could determine this by query. It doesn't need to be hard-coded. This implicit 'routing table' is not directly exposed to the SLRP process, not even to the OS. So isn't a security risk, and it's disentangled from bootstrap issues.

you need to see the operator and it's arguments to do a reduction

Reduction requires an input buffer. I can see how you might call that a 'sliding window'.

But to clarify, in our discussions on SLRP so far, I've used 'sliding window' only to refer to a loop within the processor wherein we buffer *output* then push it back to input, allowing multiple smaller passes within a processor for each full pass over the stream. This allows the processor to focus more attention and effort where it's most useful, such as tight loops or arithmetic. The 'window' refers to the volume of focused processor attention within the larger stream. It's a very useful optimization, but not essential for SLRP systems.

With multiple interleaved streams we need to hold an arbitrary number of operators and context switching them

Why would you interleave multiple streams? Before, you were imagining interleave was needed for synchronous rendezvous. However, I've been telling you that synchronous rendezvous is not a thing for SLRP. We don't need multiple interleaved streams, at least not at fine granularity. We can interleave streams at a coarse granularity by doing all of [A] then all of [B] then all of [A] then all of [A] then all of [B] and so on. (Doesn't need to be homogeneous round robin.)

An SLRP processor/core will handle one stream at a time, from end to end, without context switching in the middle. This allows for very simple pipeline processing. For priority, it is sufficient to process some streams more frequently than others, or on dedicated processors, or to pipeline them through more processors. We can also partition a large stream into smaller streams that can be prioritized independently.

This also only seems to work for equal proportion timesharing, there is no easy way to have dynamic prioritisation where the priority of a process changes with time.

Remember, one interesting thing we can do with a SLRP stream is to partition a large stream and process the partitions separately, even remotely. This is feasible due to local rewrite semantics (relevantly, 'control flow' is not part of SLRP semantics). We only need some means to track the logical containment of partitions and stitch things back together when needed. Virtual memory references work for that.

Thus, although we may have one big OS stream that logically contains our apps, we could physically arrange each app into its own partition. If we have more partitions than processors, we'll need to timeshare some processors. But we could easily process some partitions more frequently than others, or use dedicated processors for high priority foreground apps while all background computations share a processor.

I don't understand why you believe prioritization would be particularly difficult. The physical aspects are simple enough. So that just leaves software management (partitioning, quotas, etc.) which we can make explicit via effects to any degree desired.

Unfamiliarity

I am sure at least 50% of my concerns are due to unfamiliarity with the model.

I don't get how short streams can work. How do the streams communicate with each other? If all the streams are separate and the processor is stateless, then no information can be shared.

For example I could imagine this as an event based system where an event (say a keypress) streams the keyboard handler, which streams application handlers, but these would all need to modify some application state external to the stream (say the document being edited).

I could also imagine one large monolithic stream containing the OS and all running applications.

Perhaps you could explain the context switching further?

Where would these streams be stores, and how would they be selected for streaming?

My point about prioritisation was assuming interleaved streams where the interleaving would be fixed in the code. If you can run short streams fast enough then it won't be a problem, although I don't know how you would pick the streams? This would seem to require pointers or names to refer to the stream to pick next.

If the requirement is no pointers then all the sub-streams need to be explicitly included in the OS stream, and hence why I was thinking of interleaving.

re unfamiliarity

How do the streams communicate with each other?

Ultimately, in SLRP, there are two ways to model interactive computation. First is purely functional: we can model free monadic interactions by yielding command-continuation pairs as first-class values. This enables a process to communicate with its "host" which is probably another SLRP process. This method is only suitable for hosted processes, but could support "inter-process" communication insofar as the host explicitly routes messages. By nature, pure interactions would be synchronous and deterministic. Second, streams may interact effectfully through external network or storage devices, assuming they possess the necessary `![ChannelID]` capabilities to make requests. Because rewrites are opportunistic (no exception for effectful rewrites), any such interactions would naturally be asynchronous.

I could imagine this as an event based system where an event (say a keypress) streams the keyboard handler

Ah, no. If I understand correctly, you're imagining that every device such as a keyboard is streaming SLRP code (and somehow merging this keyboard stream with the SLRP process stream). However, SLRP doesn't impose such requirements on external devices. I would imagine instead that a keyboard emits an ad-hoc binary input stream, perhaps based on the USB-HID spec. This stream could be buffered and accessed incrementally via effectful rewrites of form `![KbdChannelID][Request] => [Response]`. The response would contain binary data to be interpreted locally. This isn't very different from conventional approaches: we have a programmable process driving interaction with external devices.

Perhaps you could explain the context switching further? Where would these streams be stores, and how would they be selected for streaming?

How does one von Neumann processor configure the actions of other processors on possibly distributed machines? The answer to this question is varied enough to write a book on the subject. Same would be true for SLRP.

Broadly, we can either develop an effectful API with external devices that allows us to explicitly configure which processors are doing what, or we can just register code somewhere that other parts of our system implicitly discover using heuristic work-stealing and migration models. Or an ad-hoc mix of implicit vs explicit. Whatever.

My point about prioritisation was assuming interleaved streams where the interleaving would be fixed in the code.

There is no synchronous interleave with other SLRP streams. The closest we get is indirect, asynchronous interaction via external devices and effectful rewrites, or ad-hoc time-sharing of a processor using hardware virtualization techniques.

If the requirement is no pointers

SLRP doesn't have a "requirement is no pointers". To clarify, there's a rather significant logical distinction between "SLRP does not require that we use pointers" and "SLRP requires that we do not use pointers". Optional vs forbidden. I've never argued for the latter. Pointers can still be convenient, useful, and efficient even if we don't strictly 'need' them for computation.

Effectful interactions with a conventional storage device (with volumes and blocks), or IP-layer network communications (with IP addresses and sockets), or immutable virtual memory references (via constrained API like `![VMLoad][VMRef] => [Block]` and `![VMStow][Block] => [VMRef]`) would each involve 'pointers' under some loose hand-wavy interpretation of the word. To avoid references at the storage or network layers would require a global paradigm shift toward type or topic based storage and delivery models (and even then we could still simulate refs).

All application streams would be logically included in an OS stream, but effectful use of virtual memory (`![VMStow][Block] => [VMRef]`) does not require the `[Block]` be fully evaluated - in general, it could be treated as a parallel or lazy computation. Thus, the app within a `[Block]` could be physically separated from the OS and processed separately. An extended API involving the `[VMRef]` could feasibly tweak prioritization or scheduling of this computation by external processors and devices.

Interleaving.

I am not sure how we can do prioritisation without pointers, as without pointers we have to interleave the streams as they are fed to the processor.

Some external hardware could do this, but this in turn would require some kind of pointers or naming to read the streams in for interleaving dynamically.

To rely on external hardware does not remove the problem, it simply declares it out of scope. The "pointers/names" are still needed to make the system work.

So it appears that although we can achieve pure functional computation without pointers, and maybe emulate some simple stateful / event based computation, there are some tasks that we cannot do without them, like dynamically multiplexing processing streams. This boundary between what can be done without pointers and what requires them seems interesting. It suggests that pointers/names are not just an improvement in expression, but also increase the what can be expressed. They change both syntax and semantics.

This suggests to me that naming/pointers (a name is just a symbolic pointer) are a strong candidate for the first tier of abstraction above combinatory logic. I still think combinatory logic with no means of naming or structuring seems to be the bottom tier of the computing abstraction hierarchy.

re prioritization without pointers

You seem to be assuming a particular mechanism for prioritization similar to thread schedulers. In this approach, we have a 'scheduler' that scans a selection of streams in addressable storage then chooses one and reads it into the SLRP processor. Your argument is that the scheduler needs to track indices into addressable storage, which looks vaguely like pointers (even though it's second-class).

However, that is not the only approach to prioritization.

If we do not have the ability to reference streams externally, then prioritization can only be based on in-band characteristics: headers, timestamps, deadlines, energy models, or other cues. If we don't have unbounded addressable storage, then we can only select among a static set of inputs, e.g. based on physical wiring.

For stream-processing, headers are particularly convenient. Our physical computation layer may consist of a network of cores or processors, with routing nodes. A routing node would have multiple inputs and multiple outputs (albeit, a finite, static number determined by physical wiring), and would know enough to distinguish headers and route them. If two incoming stream headers indicate the same output route, the routing node must select one stream to deliver first (perhaps influenced by a priority field). The stream rewrite processors still only need one input and one output channel (modulo side effects).

In this context, different routes will have different performance characteristics - different numbers of processors, different strength of processors, different levels of contention, etc.. Thus, choosing a route would prioritize the stream. Further, in case of contention, we might still prioritize some streams over others.

For 'interleave' it's feasible to divide a large stream into fragments (analogous to slicing a TCP stream into packets) that can be routed and processed independently then stitched back together at this 'network' layer. This might require some GUIDs for stitching streams, but it's very different from references to an addressable storage or virtual memory. For example, it wouldn't support pass-by-reference within the computation.

In any case, we have more freedom on the subject of prioritization without pointers or addressable storage than I think you've imagined. It's just going to be closer in nature to network traffic prioritization than to conventional thread schedulers.

This boundary between what can be done without pointers and what requires them seems interesting.

Sure. References or pointers can be quite useful. And without sufficient support, there are some things we cannot do - such as ideas ultimately defined in terms of refs like scheduler-based prioritization, or existential key-value lookup tables that interact conveniently with a garbage-collector (aka ephemeron tables).

However, there are a lot of *other* abstractions that can be similarly characterized. Dynamic allocation vs static allocation is certainly an interesting boundary. Reversible computing vs irreversible. Concurrency. Observable non-determinism. Various levels of 'termination' up to general recursion. Even static location vs mobile processing or computation is intriguing in context of pervasive computing, smart phones, or augmented reality. Etc.

If your goal is to develop a set of interesting boundaries for computation models, I agree that refs should be included. But I'm not convinced that the resulting set of boundaries should be classified into "hierarchical tiers" the way it seems you're imagining. Should concurrency be above pointers or below them? Does it even make sense to ask that question?

I'm more inclined to use "dimensions" when thinking about these features of languages or other abstract models. IMO, we should only argue for a 'hierarchy' insofar as we discover provable, acyclic dependencies between concepts.

Multiplexing nodes and Dimensions

I am aware of being able to multiplex steams based on hardware connections, this is in fact how the stream based system I was involved in the design of worked, and as I mentioned above had a hyper-cube architecture. Realistically for general purpose computation this is not good enough, either you are severely limited to running only one or two processes at a time, or you waste a lot of hardware that rarely gets used (increasing the cost). The property we want in time slicing is that the available resource is always used efficiently, and this is a property that multiplexing hardware nodes does not have, so commercially this would never work.

I also see your point about dimensions. I was only referring to a hierarchy because of the original topic of this topic. However I do think there is an abstraction hierarchy, we can see it in mathematics.

First I just have the things themselves, my sheep that I give to the local deity. The sheep are individually unique individuals. Being able to name them “sheep” and count them is clearly a paradigm shift in abstraction from simply having the unique individuals. I can now keep records of donations to my deity, who made those donations, how many sheep etc.

Then the scribe tallying the sheep wants to know how many sheep people donate on average (to tell if a particular donor is to be praised or reprimanded), which involves a complex operation like division. This is hard, and requires training, so being able to write down this “algorithm” is again a paradigm shift.

After a while the number of known algorithms becomes large. Algorithms can be grouped and their common requirements factored out, resulting in families of algorithms that can be selected by ordered by these common properties. Things at this point are pretty abstract, so a concrete example may help - sorting algorithms grouped by memory access pattern: single pass forward iteration, multi pass forward iteration, bidirectional iteration, indexed (distance between two iterators can be calculated), random access.

There is a clear hierarchy of abstraction here are each one builds on the one below and would not be possible without it. Mathematics itself would not be possible without the abstraction of naming, because without naming we cannot count sheep, because there is no such thing as a “sheep”, it is a name applied to certain physical phenomena that are within certain tolerance of a referenced norm. No names, no references, then no arithmetic. No arithmetic then no algorithms. No algorithms then no generic categories of algorithms.

re counting sheep

Realistically for general purpose computation this is not good enough, either you are severely limited to running only one or two processes at a time, or you waste a lot of hardware that rarely gets used (increasing the cost)

Regardless of whether your assertion here is valid (I wouldn't assume your experience generalizes to all network architectures or is fundamental and unsolvable just because you failed to solve it), it still qualifies as prioritization without pointers. Bringing up hardware utilization is just moving the goalposts.

I was only referring to a hierarchy because of the original topic of this topic.

No, that was a different thread. This one's just introducing SLRP.

First I just have the things themselves, my sheep that I give to the local deity.

AFAICT, the existence of something is not a dependency for abstraction. Consider the "local deity" in your example. We can talk about counting dragons and lightsabers just as easily as we talk about counting sheep.

Being able to name them “sheep” and count them is clearly a paradigm shift in abstraction from simply having the unique individuals.

You've made a category error here. The word 'sheep' is not directly an abstraction over individual sheep. It's technically an abstraction over the description or definition of sheep within a language (perhaps something like "a domesticated ruminant animal with a thick woolly coat and (typically only in the male) curving horns. It is kept in flocks for its wool or meat, and is proverbial for its tendency to follow others in the flock.") There are ostensive definitions based on a sample set, like a biological form of machine learning. But even if defined ostensively, 'sheep' is ultimately a linguistic abstraction. Even if we don't have the ability to name them 'sheep' we could still talk about them in the abstract or count them.

This distinction is more obvious with formal languages, like regular expressions where we can cleanly describe a pattern like `'0'|[1-9][0-9]*` separately from giving the pattern a name like `Nat`. Similarly, we could count all instances of this pattern in some scope without ever needing the name Nat.

If we wanted to count all images of sheep in a filesystem, it would require considerably more effort to represent this pattern. Indeed, this pattern may require various algorithms for: how to recognize an image, how to recognize a sheep within an image, how to traverse a filesystem, and how to count.

being able to write down this “algorithm” is again a paradigm shift

Sure. Our ability to describe algorithms requires some language features (and ways of thinking) that are distinct from a language that does not support algorithms. But, relevant to my earlier point, this is not a hierarchical shift. That is, a language that can represent algorithms like counting sheep doesn't strongly require or imply the ability to define words like 'sheep' or 'Nat' or 'successor'.

Without such an ability, of course, we might be stuck repeating a multi-megabyte algorithm every place we want to use it. This isn't very efficient. Perhaps there are alternatives to naming involving graph-structured syntax or pattern-matching within an environment. But regardless, language abstraction via defining words is convenient and useful. However, the question was never whether names have value, rather whether the ability to assign names is necessary for describing or abstracting ad-hoc algorithms.

There is a clear hierarchy of abstraction here are each one builds on the one below and would not be possible without it. Mathematics itself would not be possible without the abstraction of naming, because without naming we cannot count sheep, because there is no such thing as a “sheep”, it is a name applied to certain physical phenomena that are within certain tolerance of a referenced norm. No names, no references, then no arithmetic. No arithmetic then no algorithms. No algorithms then no generic categories of algorithms.

Pretty much every clause in your conclusion is simply untrue. There is no clear hierarchy of abstraction. It is not impossible to have algorithms without naming. If we don't have a name for "sheep" then we could instead describe our "certain physical phenomena that are within certain tolerances" in place of the word "sheep". Our ability to represent arithmetic, algorithmic behavior, or meta-algorithmic abstractions in a language does depend on language features. But naming doesn't need to be one of those features.

If you want to argue that language or math would generally be inefficient or impractical without naming or at least some form of separation model that can fill the same role as reference vs referent, that's fine. You could make a strong argument for that. But it'd still be wrong to conflate 'possible' with 'practical'.

Named categories.

Sure, "sheep" is a pattern match filter, except it's not it's the name of that filter. It's not really the kind of filter that you can write down, so it would be pretty useless without a name. But you miss the point, numbers and counting _require_ abstraction.

Remember abstraction is not considering all the details. If we look at all the details every sheep is a unique individual, subtly different in some way. We literally abstract away those differences when we group similar objects into categories like sheep. Without this abstraction there is no counting because everything is totally unique, the only numbers would be one and zero, because you either have the unique entity or you do not. In order to count things we need the ability to categorise, and the categories need names to organise them. So named categories are an abstraction, and a necessary step on the way to counting, arithmetic and algorithms. I would argue you cannot have algorithms without the concept of named categories. I think what you have done is made the category names implicit, but they still have to exist in the mind of the programmer, or they would not be able to write the program.

counting black sheep

Names are indeed convenient for organizing. But there are plenty of countable abstractions that never receive an independent name. There is no dependency on naming in a true, formal logic sense.

For example, if I say "count only blue cars", we could consider 'blue cars' an abstraction. But 'blue cars' here is not a name any more than any other composite description of filters or lambdas. Given a complete set of primitives, the ability to compose descriptions and filters is provably sufficient.

Of course 'Blue Cars' can also be a name (cf. https://bluecars.nz - a place for rental of electric vehicles). Even 'Counting Blue Cars' can be a name (a song by Dishwalla). Naming isn't by nature descriptive. Rather, it's associative. The ability to associate a short name to a large description is very convenient. We would call this behavior a 'dictionary coder' in context of compression, a technique that reduces the cost of repetitious communications.

Yet, there is no hard upper bound on the size of an anonymous description we can use in counting or other algorithms without the convenience of naming. From 'blue cars' we could move to 'blue cars with broken windows' or 'blue cars with broken windows and four-cylinder 2.4L engines'. You suggest that names "have to exist in the mind of the programmer", but I don't believe this is the case: we name things at our convenience, but AFAICT we humans don't bother to name every abstraction we work with. At most, you could argue that the convenience of naming is necessary in context of *human* limits (economics, attention, memory, lifespans, the low data-rate at which humans communicate, weakness at learning and applying other compression algorithms that machines might be able to use easily, etc.). But this would be a very different argument from your claim that naming is "a necessary step on the way to counting, arithmetic and algorithms" at a fundamental level, independent of humans.

Things like capabilities are

Things like capabilities are traditionally managed by the OS kernel because they are too complex to implement in hardware.

There have been a number of tagged memory computers, and there's CHERI and SAFE.