The evolution of Rust

Graydon Hoare is the original developer of Rust even before Mozilla adopted it. For the 1.0 release he prepared a lightning talk on how the language changed over 10 years.
He only published some bullet points, but the topic list is interesting as well.

  • Six ways Rust is fundamentally different from how it started
  • Six ways Rust is fundamentally the same as how it started
  • Six things we lost along the way
  • Six things we gained along the way
  • Six things I'm irrationally, disproportionately pleased by

Read the full blog post for the content of the five lists.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

'Six things we lost along the way'

It's possible to be sad about each of these:

  1. Typestate system

  2. Effect system
  3. Function complexities: parameter modes, argument binding, stack iterators, tail calls
  4. Language-integrated runtime for tasks, channels, logging
  5. GC pointers, task local GC (yes, rustboot had a real mark/sweep)
  6. Dynamic, structural object types

silver linings?

I guess part of the reason was usability testing. Some aspects of some of those things were deemed "too much". Consider what Rust folks said about Pony: nice, and we had that stuff for a while, but dropped it as overly complicated for real programmers.

So the question is, are those things worth the extra effort? And / or can the effort of thinking about them and dealing with them in the syntax be brought down low enough to make them a good ROI feature?

This question has no proper

This question has no proper answer. Ideas "work" and then they "work." Sometimes they just need to work better before they are ready for prime time, and so we researchers just keep banging on them!

But if you are designing a language for primetime, evaluate and pick the stuff that works today. The Rust team picked what was thought was ready, found many that weren't really, and still came away with a pretty innovative language.

Effect Inference

Did Rust have effect inference, or was the programmer expected to annotate code with all its effects?

You mean Koka? I always

You mean Koka?

I always understood the problem to be that you hit top way too quickly with effects (i.e. the function is either pure or do anything). Incidentally, monads have the same granularity problems.

Fine grained effects

Well an effect could be reads this object, writes to that object. Without inference such fine grained effects become unmanageable, and things tend to get grouped into a do everything effect like the Haskell IO monad.

Top at the top

Effect systems needn't have a top, but I gather you're saying that by the time you get to the top you'll have a crazy number of effect annotations. I think the solution is to make effect annotations optional. The type will only be especially gnarly near the top level of your application where you've exposed an "application" that does all sorts of things. Anywhere else, it's a useful concept.

Functional encapsulation

In my system effects are inferred, and you can write effect constraints. So you can annotate a function to say no effects can escape, or only IO to the terminal can escape.

Some effects can be contained. For example consider a local store (stack based). For the duration of the function there may be effects on that store, but outside the function we are guaranteed effects do not escape (the store is constructed on entry and destroyed on exit).

Hence in my system the top level could be an effect free pure functional program, but the functions and objects may be impure internally.

The top level effects of a file copy program would be something like:
(File x, File y) -> Eff {Reads x, Writes y} ()

I would say there isn't much

I would say there isn't much middle ground between pure and opaquely impure, even with nice region abstractions. I would guess that to do better, we would need more...subtyping. But this isn't my field.

Parametric Effects

Not sure why it needs subtyping. The effect system uses row polymorphism, so its effectively a type level set. Combining sub-programs unions the set of effects. Effect constraints succeed based on set operators (member, not member). This doesn't need subtyping as the effects are parametric, so matching takes place by unification.

Subtyping allows us to hide

Subtyping allows us to hide extra information when safe, it is not the only way to do that but it is the most direct. Parametricity is quite orthogonal to subtyping (the ability to have details open vs. provide more details than expected).

Parametricity vs Subtyping

I see what you mean now. You could hide the effects by some kind of subtyping? That is different from what I am suggesting, which is by proving the effect does not escape some closure/container, we can erase the effect from the type. Some proofs are trivial, like mutation effects on an object cannot escape the scope which contains that object.

But that local state that

But that local state that doesn't escape, why is it even there? It just seems like bad style: use mutable state when you really need it, vs. transient state that would be better off done functionally anyways. So I just don't think this problem exists.

Performance

Let's say I have a fast (non allocating) in place mutating sort algorithm. I can allocate a temporary mutable object, copy the data in, run the mutating sort, and then return the output object. This is totally safe, but further if we can prove the input is not re-used then you don't even need the copy (if for example the input is a temporary).

Edit: I think the way this should work is if the only use of the input can be proved to be the mutable sort, its safe, otherwise the code would fail to compile and you would have to explicitly copy the input before calling the function to pass type checking.

You could go the other way

You could go the other way and optimize a functional encoding, rather than determining that an imperative encoding was functional. I also don't see the optimizations being very significant in the overall scheme of things.

Hides cost, Cannot change algorithm

There are at least two problems with that approach, it hides the cost from the programmer, so they cannot make informed performance decisions, so it becomes a game of what weirdness will encourage the optimiser to output the kind of code you want. The other problem is that optimisers are just not that good. I am not aware of an optimiser that will swap from a read only bubble sort to a mutating quick sort. The issue is the data structures and algorithms for the read-only and mutating cases are very different, and there is a significant performance gap. I don't believe optimisers can get good enough because its fundamentally a search problem (to find alternative algorithms) with an unbounded search space.

A quick example of a mutating algorithm its difficult to match functionally is the disjoint-set union-find. Particularly with weighted union and path compression.

Protected Optimizations

Optimizations should be predictable and simple to avoid voodoo programming experiences. I agree that ability to make 'informed performance decisions' is valuable. I don't believe optimization "from a read only bubble sort to a mutating quick sort" is a good idea. But I also don't believe it's necessary for the 'optimize a functional encoding' approach.

Minimally, if we reliably supported linear vector values, where computing an updated vector may be implemented by mutation (rather than by copying the whole vector with one changed value), this would allow us to directly express most imperative algorithms within a functional context. This can be an easy optimization to recognize and protect (by type or annotation), and predictable in its performance and behavior. With just this support for linear vector values, FP could directly express most imperative algorithms.

Also, due to the popular myth of O(1) RAM, a great many imperative performance analyses are unsound (even in peer reviewed papers). Union-find algorithms are no exception. I do believe mutating algorithms provide savings, but not as much as naive asymptotic analyses might indicate.

Benchmarks

You can only believe actual performance measured on a target machine. Algorithm complexity is good for algorithm choice but not optimisation.

The best (performance, and possibly understandability) algorithms are (and in my opinion always will be) imperative.

I am using linear types as part of the type system. The difference between ML and imperative is really just one of style. Force annotations of values with const, and default to references, plus some syntactic sugar to allow unary functions.

As to the vector, this is precisely the use case that I am looking at. You want to be able to define a vector using high performance code that is close to the machine model, and use it in a safe functional way. The main thing I want is both these things in the same language so you can define new vectors optimised for different problems, as well as sets, maps, and all the different containers without having to rewrite the compiler every time.

The CLR performs many

The CLR performs many optimizations that matter (e.g. polymorphic unboxing), but it is a policy to not document them so you can't depend on them. So what I can do is either hope or write my code to not depend on them, if it really matters I take the latter approach.

So if you can guarantee optimizations in a way that is easy for developers to reason about, then that would be a big deal. It seems like a good focus for the space Rust is in. By definition, these optimizations need to be local and relatively simple on the analysis side, or they need to be backed by non-modular perf asserts.

Targeted Optimisations

Yes, I think to document the optimisations the language has to be able to express the optimised form. In that way you can write pre-optimused code, or you can write high level code and a specific set of optimisations. You can write domain specific optimisations as transforms, and you can use the logic proof-engine to force optimisations. This could for example be force all loops to be tail call optimised, this would then fail to compile if any loop is not optimisable in this way. The proofs of optimisation should work the same way as the proofs for effect containment.

the paradigm imperative

The best (performance, and possibly understandability) algorithms are (and in my opinion always will be) imperative.

As long as there's mutation in how the bare-bones hardware works, imperative presumably must have a potential performance advantage since it can get closer to the hardware; if some future technology development leads to ubiquitous computing hardware that doesn't involve mutation, that would likely change.

I too favor imperative, at least in potential, on understandability, but note that doubt must always linger about a subjective judgement like this whose list of options is necessarily unbounded. It's one kind of judgement to choose between (what we now understand as) imperative and functional paradigms, quite another to choose between current imperative paradigm and all other paradigms that somebody might devise in the future.

Operation cost

I think special purpose hardware could be developed that runs functional code better than it runs imperative code, but I think this will be an intellectual curiosity. I do not think the fastest functional code on the fastest functional hardware will challenge the fastest imperative code on the fastest imperative hardware. I think the fundamental operational cost of allocation is higher than that of mutation.

Also highly performant imperative algorithms can be quite simple (for example union-find with weighted union and path compression) however performant functional algorithms involve complex data structures like finger-trees, and often hide mutation inside things like efficient versioned strings. Not that I am against hiding mutation, but I think its better to be honest about it. Hence a language in which you can write a versioned string with a change list and a mutated current version, but package it as a functional read-only value.

I think special purpose

I think special purpose hardware could be developed that runs functional code better than it runs imperative code, but I think this will be an intellectual curiosity.

With foreseeable technology. I meant to allow for the unlikely but imaginable contingency that some hitherto unforeseen technological trick makes it possible to build vastly faster hardware but with the operational property of being non-mutational. So that the way we program changes to fit the computational means available. I doubt this will happen, or if it does I doubt it'll last beyond the next technological revolution after that, because I think physics, the substratum that the hardware in turn is built on top of, is imperative.

re: physics imperative

I believe physics is much closer in nature to a high-dimensional cellular automata than it is to imperative. In many senses, that's a declarative paradigm (similar to a temporal logic), albeit with some strong modal properties, locality laws, and constraints against allocation.

Further, I expect it's a reversible cellular automaton (conservation of energy and information), albeit with reversal requiring a larger neighborhood than the progress function (thus creating the arrow of time). :)

Hyperedge replacement grammar

I think physics is even closer to a hyperedge replacement grammar or port graph grammar (the latter being a formalism I named [1], based on Bawden's Connection Graphs [2,3] and Lamping's partial-sharing graphs [4,5,6]).

Refs

[1] Charles Stewart. Reducibility between classes of port graph grammar. In Journal of Computer and System Science 65(2): 169—223. Academic Press, September 2002.
[2] Alan Bawden. Implementing Distributed Systems Using Linear Naming (LtU story)
[3] Charles Stewart. Compiling AGG into the Network Linear Graph Reduction System (talk abstract). In G. Taentzer, L. Baresi and M. Pezze, editors, Proceedings of the Second International Workshop on Graph Transformation and Visual Modeling Techniques. Milano, Dipartimento di Elettronica e Informazione Politecnico di Milano, 2001.

[4] Lamping, 1989. An algorithm for optimal lambda calculus reduction.
[5] Gonthier, Abadi & Levy, 1992. The geometry of optimal lambda reduction
[6] Charles Stewart. Three Kinds of Sharing in the Lambda-Calculus (Talk abstract)

Seems too expressive

I somehow doubt physics has a notion of 'blocked rewrites', and conservation of energy and information tends to suggest reversible computing (e.g. reversible cellular automata). But I don't doubt you could encode arbitrary cellular automata in these graph grammars.

In any case, we certainly don't have enough information to determine the nature of physics (cf. relevant xkcd ;).

Hyperedge replacement grammars are non-blocking

...by default (i.e., without negative constraints on rules) although blocking is a plausible interpretation of certain physical phenomena,. e.g. quantum exclusion principles.

I don't believe in 'one true PL', but I do believe that some formalisms are well-suited to model the kinds of systems one meets in the real world better than others.

I fully agree

I agree with that. I've read some reports on functional language aimenable hardware, the Lisp Machine and more recently some work with an FPGA by some technical university in the Netherlands, and I don't think it will ever emerge.

Personally, I think the idea that a functional language is a GPL should be given up. It may have nice niche markets where you want strong guarantees on the correctness of algorithms written, no matter the cost in performance, like hardware specification and verification.

Therefore, I stopped caring too much about performance and am now writing my small declarative language interpreter in C++.

But, yeah, even in such a niche market, programmers still want for loops.

You can always develop

You can always develop special purpose hardware that runs functional code better than it runs imperative code - LISP machines - but whether it will be comparably fast to OTHER hardware is questionable. Also, see the Atari Transputer (not exactly functional).

The bottlenecks for conventional functional and imperative code are around the same order. GPUs were for a long time quite functional (vertex-pixel shader pipeline), but that all went away with the GPGPU stuff and it is now incredibly imperative with a tricky memory model to boot. The future of high performance seems to be CUDA (barring big data which cannot fit in GPU memory), and performance is incredibly predictable (not so much abstraction).

Not functional

The Transputer was (is) emphatically not functional. It was designed for the kind of massively parallel applications enabled by the Occam language, which was imperative but did not permit shared data between its lightweight processes. The Transputer (the INMOS processor, not the Atari workstation based on it) ran into two major problems, neither directly related to performance relative to "standard" hardware:

  1. Using it well required learning to program in the kind of fine-grained concurrent mode that suited the Transputer architecture. That was a huge barrier to entry for a lot of people. It would probably be less so today, since that sort of concurrency is much more familiar practicing software developers now (see Erlang, Go, etc.)
  2. Although the early Transputers were very popular despite the language barrier, INMOS suffered major delays in releasing its next-generation Transputer. At the same time , Intel released a new generation of faster processors, and the Transputer fell out of favor.

The future of processor architecture is many-core (not CUDA)

The future of processor architecture is many-core and not CUDA because it is too hard to program.

Transputer architecture was limited by the following:
* Limitations of expressiveness of Occam
* Limitations of performance of Occam because of requirement to use channels, etc.
* Lack of many-core processors

CUDA is absolutely

CUDA is absolutely incredibly successful (problems that took weeks to solve now take hours or less), it just isn't general purpose. The future of high performance systems will continue to be split between distributed and GPU, multi-core performance will continue to be a thing that matters to those who don't want to (or can't afford to) invest in high performance.

In many applications, many-core can outperform CUDA

In many applications, many-core can outperform CUDA.

Many-core will do better and better against CUDA as the number of cores increases.

There are of course

There are of course applications where CUDA does not provide a speed up...at least without re-thinking the application. Mini-batching, for example, is used as a way of making gradient descent applications more SIMT when their original formulations are not. The GPU is only truly limited by the small size of its memory, which is why big data will continue to be distributed (you can also use GPUs and distributed at the same time, but that is still immature).

Simply adding cores won't allow CPUs to catch up with GPUs, especially where GPUs are obvious wins. And even if they did, the power costs would be astronomical. Instead, we are more likely to see a fusion of CPUs and GPUs (general purpose CPUs with on chip SIMT capabilities) like we saw earlier with SIMD. Larabee tried this, it didn't work well, but I expect Intel/AMD/NVIDIA to keep banging on it.

For personal use, I don't think we will see big many-core

I don't have the feeling we will see big many-core for personal use. Maybe for server farms where you're less hindered by Amdahl's law.

There is still some room in Moore's law left and room for some vertical (up the stack to language/OS/application) and a lot of room for horizontal (SoC) integration, so personal computers will get somewhat faster.

But I mostly think this is it. Endgame.

Carbon nano-tube memory

Carbon nano-tube memory has been in development since 2001, and sample chips are going out now. Graphene transistors offer some good potential, as does photonics. There are still architectural improvents like vertical stacking that will enable wider bus widths. And thats just what's on the horizon right now. There's a few more orders of magnitude scaling right there.

None of that is exclusive to

None of that is exclusive to general purpose cores, however. The problem is that every chip advance seems to widen, not shrink, GPU/SIMT performance leads.

Single core gets faster

Parallel software is harder to write, so as single core performance improves, less and less will actually need the parallelism.

it doesn't work like that in

it doesn't work like that in HPC: you turn days into hours, then you go for hours into minutes and then...seconds. That and energy consumption is turning out to be a bigger deal, general purpose cores aren't just very competitive.

But if you aren't in that market and performance is just icing...then it is another ball game.

HPC

I agree, but HPC is already running on 1000+ node clusters, so that is not really going to change. HPC is a very small part of the total market.

Actually, that is more big

Actually, that is more big data side of HPC. There is a push to combine CUDA and MPI, but it is still quite early.

Contained operations

I have done some CUDA work, and its more like highly parallelisable macro-operations. So for example I want to do a Fourier transform, I might have a kernel function and use CUDA to run it in parallel. OpenCL lets you do this on CPU or GPU so you can use all available resources. This is still not (and will never be) suitable for general program control flow. I agree this is a good target for a language, so that the parallel / serial is as seamless as possible, although I think you would still be writing a parallelisable kernel function (as any kind of auto-kernel extraction is likely to result in lower performance).

Betting on Disruptive Technology

The great thing about Moore's law is that it gave a feasible roadmap without betting too much on disruptive technology (though they needed to change some stuff along the road I gathered.)

Hence Intel and others gladly laid down the billions of investment money each time since they knew miniaturization would pay off predictably because Moore's law implied a doubling in processor speed and chip real estate. They even were forced to, since they knew they wouldn't be able to sell a processor if they didn't scale down with the rest of the competition.

It wasn't disruptive, it was just betting on being able to scale down silicon lithography, which was a safe bet.

The road from graphene to production is that disruptive that I am not buying it yet, and I don't see people laying down the billions already.

It might happen. It may also never emerge. I am quite happy with my laptop. By the time they industrialized graphene transistors we'll probably be twenty to thirty years further down the road.

And even if they can, there's no reason to assume those chips will make it to the consumer market soon. A cheap silent quadcore laptop with loads of memory will serve all my basic needs for the foreseeable future.

Nanotube RAM + Graphene

What I take away from that

Graphene transistors still a long way off but many competing technologies to replace DRAM and NAND flash.

I doubt the NRAM story a bit. Looks to me fabs are now in the test phase of seeing whether NRAM might be a possible contender; though the article hints that somehow they finally managed to get write speed up, which would be a significant break-through for NRAM.

With the list of possible contenders we'll likely see faster, bigger, and cheaper(?) non-volatile memory at some point in the future. I agree.

Number of cores will continue exponential growth for a decade

The number of cores will continue exponential growth for a decade and will be the main driver for increases in general purpose computing on phone computers, personal computers, and servers.

Software is going to have to catch up!

There is no study warranting that idea

If you look at contemporary studies such as "Dark Silicon and the End of Multicore Scaling, D. Burger et al." then I don't see it happening fast.

Amdahl's law predicts something like that even with 99% parallelizability you won't scale beyond twenty cores or so. I only see webservers scale linearly because that's more, or less, lots of independent tasks running concurrently.

CS won't solve this. CS never had anything but a modest impact in computing, almost all the work is/was done by physicists and EE and that will remain so.

Tree algorithms

For algorithms that involve shared access to a tree data structure the number of useful cores is between 4 and 8, so most desktops are already at the limit for this kind of application.

Where did you get that Wisdom?

Not being snarky, just want to know the reference. I am interested since my language is a tree rewrite system at heart (DAG if I am being pedantic.)

I am betting on tree rewrite systems instead of graph rewrite systems (as in Haskell/Clean) since tree rewriting is somewhat naturally supported with a reference counting GC and I believe that at some point reference counting may be faster than the stop-the-world approaches you need for graph rewriting systems.

At some point, given vertical integration, I have the feeling that reference counting might make it to the silicon level (if C++'s shared pointer becomes very popular, probably), so it might pay off.

Though I would need to change my mostly pure language for programming in the large I think.

UCT on HPC.

UCT tree algorithms, descend the tree each iteration, and update the nodes on the return. Marshalling the parallel access to the shared tree seems to be the bottleneck. See: https://www.cs.uni-paderborn.de/fileadmin/Informatik/AG-Platzner/People/Schaefers/uctTreesplit.pdf

Silicon will scale to at least 10^3 cores within 10 years

Silicon will scale to at least 10^3 cores within 10 years and will be mainstream on slate computers, PCs, and servers because this is the only way to get general purpose performance.

Amdahl's analysis applies to simple parallelism; not massively concurrent systems.

As usual, the software is late to the party :-(

What's wrong with C? And I've got all the performance I need

Why is software late to the party? Almost everything runs on C/C++ with a few top applications in Java, .Net or scripting languages.

Software does fine, it is being written. C/C++ adapts to almost all the needs of industry. CS/PLT is just about philosophizing whether they can create better languages than C/C++ or Java; so far, they failed. While eating the 'free lunch' of physics/EE, I dare add.

As I've stated before, I've got all the performance I need. Less moving parts in my laptop and a lot more, and faster, non-volatile memory would be great though.

The real challenge is server farms or data centers. If you can scale a data center down to something which fits into your pocket you've got a clear winner.

Because clock frequency stalled, software is getting slower

Because clock frequency stalled, software is getting slower and slower due to natural growth.

less stick, more carrot

Marco's right: folks are mostly using C to utilize extra cores. Massively concurrent systems like network load-balancers can scrape by with hash-based partitioning of load assignments. A process per die with a thread per core is an easy thing to configure, even using the most primitive means. Flows typically have almost nothing in common, unless the implementation uses a common resource (which is common, and is the cause of less than 100% parallelism).

You can't sell change by warning unused power will be left on the table, because it won't. The good parts about what a different architecture offers is the selling point. The carrot in a suggestion is better than a stick, because the art of dodging sticks is down cold, and no one cares about bruises anyway. You want to say software will be easier and faster to write, test, and debug, even by the next generation of devs with even less experience in systems programming than the last generation. Both Rust and Go are working different angles on development cost.

New kinds of applications: require enormously more performance

New kinds of applications will require enormously more performance.

What are those?

What applications? I don't see them on the horizon? I need a reason to part from my money...

The only need for performance I see is in AI. My laptop is a great consumer device, it serves all my need. I can type text, and edit/stream audio and video content. More memory and bandwidth is more important to me than a faster processor. The data center is another game.

The only thing it doesn't do yet is talk back. And that's probably not a language problem but a system engineering problem.

320 cores then?

I have 32 cores on my desktop. I had 2 in 2000. I agree that this exponential trend may continue and give me roughly 10x more cores in 10 years. Or did you mean a higher exponent?

I didn't say it was

I didn't say it was functional, I just used it as an example of an architecture that was significantly different that didn't succeed.

The main problem is that the mainstream architecture will continue to have an advantage in R&D, and it's not clear if alternative architectures can succeed with similar investments, and it is definitely clear that they cannot with lower investments.

Of course

Sorry if it seemed that I was implying you thought the Transputer was functional, Sean. Not my intent. My comment was aimed more at other readers who might not be familiar with the Transputer design.

The universe's hardware is

The universe's hardware is probably imperative, just massively distributed with relativistic consistency (updates are observed at the speed of light).

My beef is not so much about optimal performance, but just knowing what order the performance will be in. A functional implementation that isn't doing magical optimizations will give you that just as much as an imperative implementation. We don't need fast, just not surprisingly slow.

Unclear

I think the jury is still out on what the nature of the universe's hardware is. Entanglement would tend to suggest that the speed of light might not really be the key to understanding how it all works underneath.

Indeed, recent work in

Indeed, recent work in axiomatizations of quantum mechanics as an information theory have some interesting things to say. Basically, only one of 6 principles (purification) distinguishes QM from classical information theories.

Entanglement doesn't

Entanglement doesn't transmit information though. As far as we know, the speed of light is the limit for information transmission.

ignorance

I'm not sure we know all that much; dark matter, and more so dark energy seem like a rather strong suggestion we've made a seriously wrong turn somewhere (fwiw, a phrase I've been playing with lately is "robust nonlocal interaction")

Note that if information can

Note that if information can travel FTL, time travel to the past is possible and we have some serious issues with causality.

That conclusion likely comes

That conclusion likely comes from supposing general relativity plus FTL info-transmission. It may be appropriate to think further outside the box. Causality being a primal notion of "time", general relativity a particular technical treatment defining "time".

I don't see why dark matter

I don't see why dark matter or energy are indicative of a "seriously wrong turn". Observational discrepancy with theory that could only be accounted for with some hitherto undetected phenomenon is hardly unprecedented. In fact, it's exactly what I'd expect to see in any science.

Turn it around: if current

Turn it around: if current theory were seriously mis-structured, by what sort of observable symptom might that mis-structure reveal itself? The further beyond ordinary experience your theory gets, the more possible it is that some theory with a radically different internal structure might have extensive empirical overlap. Deep current theory may also exert extreme influence on thinking to prevent asking questions that would distinguish it from alternative theories. So a serious mis-structure should be taken seriously as a possibility if a symptom does show up. And it's one thing to make minor adjustments to current theory; but when the "adjustment" is bigger than everything observable in the universe, that seems a plausible candidate for a symptom of mis-structured theory.

[Note: obviously we're drifting a tad off PL design. Fwiw, I do tend to see structuring theories of physics and structuring programs as kind of the same activity.]

That's not the point. We

That's not the point. We don't even know what information the universe contains, let alone how it might be stored. I think speculating about the "universe's hardware" without this understanding isn't likely to produce any useful insights.

That time and state

That time and state exist I think is not even very controversial. It isn't declarative: state(t + 1) depends on state(t). What's more, we also know that information (observations on state) can most likely not travel faster than C.

All that shows us is that ideal declarative models are just that..ideal. We might want to work with reality rather than pretend it doesn't exist.

Actually, for a unified

Actually, for a unified theory of everything(TM), both of the those things are up in the air. The laws of physics are all symmetric with respect to time, with the exception of gravity, so there's no real good reason to believe that time as a separate dimension isn't merely an artifact of our perception. _Besides_ even if we do posit time and state, if our models are wrong about what is the fundamental unit of information in our universe, then we'll end up with wildly inaccurate conclusions.

I'll give you an analogy to illustrate what I mean. Suppose that studying computers from orbit at a very coarse level of granularity with only the broadest of surveys we concluded that Earth itself were a computer, and desktop PCs were its "storage cells" and that they could store only one bit at a time, and a bit could take on one of the values "Windows" or "Linux". That's a laughably coarse and completely inaccurate characterization of Earth because it would lack understanding of a whole level of state underneath. But that's kind of what we are dealing with in physics right now.

The "reality" fallacy

Your dubious assumptions about the universe's "hardware" aside: it has zero relevance for finding good programming abstractions, unless perhaps you want to design the uber DSL for writing physics simulations. Software is fiction, not reality.

The idea that some state's

The idea that some state's current and final value isn't immediately known (and we should deal with that) has had lots of relevance to me in finding good programming abstractions, not even related to designing uber DLS's for physical simulations (though I tend to use physics examples a lot).

Your own mileage of being inspired by reality will vary of course.

I actually think the work on

I actually think the work on QM as an information theory might have some meaningful to say here, in the sense of safely composing programs that scale well, regardless of their separation across time and space. Software is fiction, but it has to run in reality, so we need something that can map well to reality at a certain level of abstraction.

Another example

I'll give you another example to illustrate how badly we could misperceive reality. Suppose there was a small window in the wall through which we can only see a small rectangle of light. Now suppose a train painted with alternating colors travels behind it in such a way that we see only one small portion of the side of the train as it passes by. We see the color of the window "change" over time and might conclude that the window stores information. This is because we cannot observe the physical process behind the wall, only the information passing through the window. If the train had an unpredictable or at the least very complex sequence of colors, it might be difficult or impossible for us to formulate any theories about what causes the window to change color.

Now suppose the train were instead a rotating wheel. Then we would eventually see a repeating pattern and even start to develop theories about and predict the color of the window. But we still haven't perceived the real reason the window looks red at 4pm, and we still might labor under a delusion that the window itself is a storage cell for information. Our conclusions based on this theory can be entirely wrong. Our ability to successfully predict the color of window depends on our ability to reconstruct a model of the reality on the other side.

You might be excited at this point that we'll look at the window long and hard enough to do calculations that reconstruct the model of reality on the other side. You're basically looking for the hidden variables. Unfortunately, you are likely fail because, first, Bell's Theorem rules out local hidden variable theories, meaning any communication necessary to implement entanglement is by nature non-local, and second, the complexity of wave functions would tend to suggest that a deterministic information theory with non-local interaction might be far, far more complex that we can imagine. What that basically means for our window thought experiment is that at the deepest level, there probably isn't a train, or a wheel, or something simple we can think about that will allow us to reconstruct the window color determination function in terms of balls and gears and pendulums hidden in a little box behind it.

I'm not a physicist, so apologies to those who might be if I've butchered QM here, but let's just summarize: Assuming the universe operates somehow like the computers we've built so far, with little cells of memory and integer arithmetic seems very naive.

I think the picture is a

I think the picture is a little more optimistic. After all, we can actually run repeatable experiments that control for different variables in order to derive the most parsimonious picture of what's happening behind that wall, ie. we can run the trains at will, and select what colours they should be at any given time, etc.

As for the complexity of a deterministic non-local hidden variable theories, we've had one since the 50s that is no more complex than orthodox QM and has survived every attack on hidden variables: the de Broglie-Bohm interpretation.

The only other loophole to existing no-go theorems is so-called "superdeterminism", and fortuitously also has an interesting computer science angle! 't Hooft's cellular automaton interpretation of QM.

We see this type of contextuality in software (and life!) all the time, where past choices constrain future choices. The sophistication of entangled state is just totally mindbending in a superdeterministic QM.

free will theorem

Entanglement would tend to suggest that the speed of light might not really be the key to understanding how it all works underneath.

Entanglement (e.g., spin and paired-particles) plus a finite speed of even just *some* information propagation implies that even complete knowledge of the history of the universe is not enough to predict its future evolution. From very weak assumptions we can conclude that it is impossible to perfectly "understand how it all works underneath".

Free will theorem (Conway and Kochen)

Contextuality

The set of no-go theorems only exclude non-contextual, local hidden variable theories. Deterministic QM is still alive and well, so your final claim is stronger than the reality.

re deterministic QM

As I understand them, Conway and Kochen do not claim to have to have falsified deterministic models of QM. I think they are consistently emphatic about this when asked.

On the contrary, they deduce from their axioms that the decision between deterministic and non-deterministic models is not a scientific question in the sense that either hypothesis can be compatible with observation but neither hypothesis can be falsified through observation.

It is on that that I base my statement:

From very weak assumptions we can conclude that it is impossible to perfectly "understand how it all works underneath".

I was replying more to your

I was replying more to your other statement, "[...] even complete knowledge of the history of the universe is not enough to predict its future evolution", which is false if QM is deterministic and you truly do have complete knowledge of the history of the universe. I suppose it depends on what you include in "complete knowledge of history".

a free will theorem lemma

I was replying more to your other statement, "[...] even complete knowledge of the history of the universe is not enough to predict its future evolution", which is false if QM is deterministic and you truly do have complete knowledge of the history of the universe..

I propose to build a replicable experimental apparatus that will conduct some number of SPIN measurements, and output some finite string of 1s and 0s.

To convince me you have a complete theory of an allegedly deterministic universe, you must in advance give me your prediction for the output of my machine.

Before any new measurements are taken, I'll input your prediction to my machine.

If your prediction is that my machine takes 1 or more SPIN measurements, I'll input that by throwing a switch that blows the machine up before any measurements are taken.

If your prediction is that my machine takes no measurements, I'll switch it on to take at least one measurement.

What does that imply?

On the one hand, if you predicted my machine would output 1 or more measurements, and my machine blew up instead, your theory is contradicted.

On another hand, if you predicted 1 or more measurements, and my machine somehow failed to blow up and produced output, we know that by definition the experimental apparatus failed.

On a third hand if you predicted no outputs from the machine but therefore it doesn't blow and takes some measurements, your theory is contradicted.

And on the fourth and final hand, if you predicted 0 measurements (not outputs) and the machine blows up or otherwise fails to produce outputs, then again the experimental apparatus failed.

You can never emprically demonstrate your alleged complete theory of future measurements of the universe, no matter how much about the past you know.

Since you never can empircally demonstrate your theory of what my machine will output, even in idealized conditions, you never can demonstrate that you have any scientific knowledge of it.

QED

This argument is fallacious.

This argument is fallacious. Either the experiment being conducted cannot include the prediction made, or you are requiring that any such predictor solve the Halting problem. This is obviously impossible in general, so this does not serve as a valid disproof that I have a complete deterministic theory of the universe, ie. even if I did have it, I wouldn't be able to satisfy your experiment, so this argument cannot disprove that I possess such a theory.

Any experiment that does not include the prediction as an input works fine, and serves as acceptable empirical validation. Of course, it's clear that some predictions will simply be intractable, even if they are not impossible in principle. Even your experiment may be conceptually ill-formed: if we need to store the entire history of the universe, then we already require more particles than exist in the universe.

Finally, to be clear, when you said full history was "not enough to predict [the universe'] future evolution", I took that to mean that the future evolution is non-deterministic, which many assume QM entails. If you meant that it may be deterministic, yet still impossible to predict due to intractability or incompleteness, then that's true and we're on the same page.

not fallacious: it's about knowledge

when you said full history was "not enough to predict [the universe'] future evolution", I took that to mean that the future evolution is non-deterministic,

That is not what it means, though.

The Free Will Theorem(s) are about what is empirically knowable, at specific times, in specific frames of reference. Given the axioms, no frame of reference causally prior to one of the SPIN measurements in the FWT set-up can predict the outcome of the measurements.

Every frame of reference which is causally prior to some future SPIN experiment can have only incomplete information about the future evolution of the universe. New information will arrive to that frame of reference when the SPIN measurement is made.

Scientific knowledge is intensely a relation between multiple frames of reference. To demonstrate that scientific knowledge exists, the knowledge must be communicable between people. Further, scientific knowledge must yield predictions of future observations based on the outcomes of past observations.

The FWTs show that no such predictions are possible for the SPIN measurements, at least given FWT axioms.

If you want a complete theory and want to cling to FIN or its alternative MIN, you must toss out either or both SPIN and TWIN.

Any experiment that does not include the prediction as an input works fine, and serves as acceptable empirical validation.

It does not work fine because of the social nature of scientific knowledge. You have scientific knowledge if you can demonstrate it to others. To demonstrate it to others you must give them the prediction in advance. Treating this as "input to the apparatus" is a way of saying that your prediction must causally precede the measurement for it to be, convincingly, a prediction.

if we need to store the entire history of the universe, then we already require more particles than exist in the universe.

There might be separate, information-theoretic axioms that would lead to some sort of argument like that but we don't need anything that complicated in the context of FWT or strong-FWT.

Actors: exponentially faster than parallel lambda calculus

Actors can be exponentially faster than the parallel lambda calculus :-)

Clean does this

Just saying.. I glanced at the source code of their compiler once, it didn't make me very happy.

vector use case

I think if FP supported the linear vector use case with flexible data types, and perhaps an option for column-oriented storage (a simple option to implement a vector of tuples as a tuple of vectors), that would probably be a good-enough foundation for building most other data structures you can dream up in libraries.

Besides that, annotating some loopy code as suitable for 'static allocation' in a sense like SAFL could possibly go a long way for reducing the burdens on garbage collectors. The important bit is explicit annotations so we don't lose this property without raising a compile-time error.

Regarding comprehension: imperative might sometimes fit an algorithm better, but FP can generally support imperative styles where appropriate (e.g. via monads or composing purely functional commands of type (stack*world)→(stack'*world)). Pure functions and immutable data structures have very nice properties for maintenance, independent testing, reuse, and comprehension at larger scales (libraries, apps).

The performance gap is a larger concern. I don't really like how Haskell tries to handle it with specialized ST monads because repeated freeze/copy-thaw operations on arrays don't have nice compositional performance. I haven't tried DDC Haskell, but it's probably closer to what I want.

Monads are type-level.

A monad (as in Haskell) is a type. We can give monadic types to imperative programs, and type check as such, but there is no need for the compiler to compile these as function code running on something like the spineless-tagless-G-machine. If we can give monadic types to type check C code for example, we could still compile it the way a C compiler does.

The linear vector, on its own is a model for memory, so in some regards I agree with you, but you have created an imperative language with a memory model much like 'C', just without the straightforward syntax.

simplicity of compilation

While I agree that specialized compilation for certain monads is feasible, I don't imagine it's easy to implement (effectively multiple low level compilers) or integrate with other such monads and the purely functional code. Linear vectors seem relatively simple in comparison: simple for compilers and interpreters, simple for humans to understand, integration is just returning the updated value.

Complex approaches might be worthwhile in the long term. I would love to see more work on compiling subprograms for FPGAs or OpenCL, or leveraging SIMD, etc.. REPA and GPipe and so on in Haskell are very interesting. But I'd be much happier with something very simple covering the 80% use case, predictable and portable across many machines and compilers. :)

I agree regarding linear vectors as a model for memory. In many ways, `functional plus linear values` is very similar to `imperative minus aliased state`. And FP can model aliased state relationally where truly necessary, e.g. via indices into a vector (though this does re-introduce memory management concerns at the model layer... OTOH, that's an opportunity for modeling problem specific GC strategies, manual or otherwise).

Syntax has potential to be a separate concern. An imperative EDSL could be compiled via library into a pure functional intermediate code that is then compiled into a low-level imperative program. This is the approach I'm favoring for Awelon project. The intermediate purely functional bytecode guards nice properties for portability, security, serialization, distribution, separate compilation, reuse, replay, testing, maintenance... many of my big concerns. Linear vectors to regain mutation-based performance seem a promising fit in this context.

Effect polymorphism

Effect polymorphism's a big thing in contemporary research — IIRC, papers from both Scala people (Lukas Rytz) and Conor McBride (his draft on Frank) make a big deal about it, and maybe there's more. Should we have a thread with pointers to such papers?

Regain?

For some (3,5,6) of those it might be possible that Rust regains it in future versions. Point 4 is fine, because the language seems expressive enough that runtime integration does not add much.

I'm not thrilled about the

I'm not thrilled about the lack of #3, at least as far as TCO is concerned.

The typical justification that tail recursion can be trivially transformed to iteration by a programmer (let the compiler do it for me!) isn't very compelling to me, so I hope that the Rust developers find a way to address the issue.

I suppose that functions such as Haskell's until could be defined for tail recursive code, or even folds and unfolds could be used for simple cases (I've done that in elisp and javascript).

The Rust team has always

The Rust team has always seen tail call optimization as a desirable feature, but initially rejected it partly because of tricky semantics when combined with destructors, and partly because of implementation issues. Since then, improvements in LLVM's TCO support led to a renewed proposal for TCO in Rust. There wasn't time to flesh out the design and implement it before Rust 1.0, but it's on the table for future versions.