LtU Forum

Egel v0.1.8 (beta) released - do syntax

Another Egel release. New in this patch: do syntax for applications, some utf bugs squashed.

Egel is a small functional scripting language. What sets it apart is that at any time during execution the program, or any program fragment, forms a directed acyclic graph. That makes concurrency, serialization, and shipping of values more trivial than in most other languages and I want to exploit that once to make mobile code a breeze to implement. Shoot any combinator anywhere!

Meanwhile, the egel language supports only the bare minimum as a front-end to a graph rewriting semantics. At the same time, sometimes I implement little syntactic sugar to make programming more pleasant.

The latest addition is do-syntax. Like in many functional languages, you can set up application chains. For example, 0 |> ((+) 1) |> ((*) 2) will reduce to 2.

I found a slight need to facilitate this kind of programming with do, a do expression abbreviates a chain, for example, def f = do ((+) 1) |> ((*) 2) is sugar for def f = [ X -> 2 * (X + 1) ]. I have found this very convenient while programming solutions to last year's Advent of Code.

Of course, this isn't Haskell's monadic do syntax, but as we all know, applicatives are as good as monads. Or are they?

Egel v0.1.8 (beta) released - do syntax

Another Egel release. New in this patch: do syntax for applications, some utf bugs squashed.

Egel is a small functional scripting language. What sets it apart is that at any time during execution the program, or any program fragment, forms a directed acyclic graph. That makes concurrency, serialization, and shipping of values more trivial than in most other languages and I want to exploit that once to make mobile code a breeze to implement. Shoot any combinator anywhere!

Meanwhile, the egel language supports only the bare minimum as a front-end to a graph rewriting semantics. At the same time, sometimes I implement little syntactic sugar to make programming more pleasant.

The latest addition is do-syntax. Like in many functional languages, you can set up application chains. For example, 0 |> ((+) 1) |> ((*) 2) will reduce to 2.

I found a slight need to facilitate this kind of programming with do, a do expression abbreviates a chain, for example, def f = do ((+) 1) |> ((*) 2) is sugar for def f = [ X -> 2 * (X + 1) ]. I have found this very convenient while programming solutions to last year's Advent of Code.

Of course, this isn't Haskell's monadic do syntax, but as we all know, applicatives are as good as monads. Or are they?

Egel v0.1.8 (beta) released - do syntax

Another Egel release. New in this patch: do syntax for applications, some utf bugs squashed.

Egel is a small functional scripting language. What sets it apart is that at any time during execution the program, or any program fragment, forms a directed acyclic graph. That makes concurrency, serialization, and shipping of values more trivial than in most other languages and I want to exploit that once to make mobile code a breeze to implement. Shoot any combinator anywhere!

Meanwhile, the egel language supports only the bare minimum as a front-end to a graph rewriting semantics. At the same time, sometimes I implement little syntactic sugar to make programming more pleasant.

The latest addition is do-syntax. Like in many functional languages, you can set up application chains. For example, 0 |> ((+) 1) |> ((*) 2) will reduce to 2.

I found a slight need to facilitate this kind of programming with do, a do expression abbreviates a chain, for example, def f = do ((+) 1) |> ((*) 2) is sugar for def f = [ X -> 2 * (X + 1) ]. I have found this very convenient while programming solutions to last year's Advent of Code.

Of course, this isn't Haskell's monadic do syntax, but as we all know, applicatives are as good as monads. Or are they?

Ann: The Logic Lab

The Logic Lab is a high-level DSL for specifying logics. There is a series of Youtube mini-lectures on this technology detailed in the Logic Lab home page. https://shenlanguage.org/Logiclab.html. You can also access these talks at the following addresses.

History of the Logic Lab https://www.youtube.com/watch?v=VqnVWLFiEII
Specifying Logics https://www.youtube.com/watch?v=AU8WCkS9n1U
Conducting Proofs https://www.youtube.com/watch?v=1WXMMk2xUSM&t=1s

The Logic Lab page also includes a download link for the program. This application needs the Shen standard library. There is a link on the Logic Lab home page to a 'batteries included' binary version of Shen under Windows (SBCL platform) with the standard library compiled inside it.

As my first talk makes clear, this is a reissue of an old technology, published by me in the 90s, which has resided in the CMU repository for Artificial Intelligence for nearly 30 years and has been reissued under Shen.

Mark

LinearLisp: a proof of concept in full linearization for AST-walking interpreters

Or, how to walk an AST with (almost) no references on the heap and how it pays off:

https://github.com/ysharplanguage/LinearLisp

Implementation: C# / .NET 6.

Comes with basic microbenchmarks for some contenders: Microsoft's PowerShell, Microsoft's JScript, and Python 3.10+

Rewrite.js – a minimalist s-expr based term rewriting system

Rewrite.js is estimated to be a Turing complete, s-expression based term rewriting system. It may be used as a curiosity computing platform, formula application system, proof checker, problem solver, and pretty much anywhere where any kind of computation is required, as long as slower performance on intensive computations doesn't go beyond limits of user patience.

Rewrite.js is designed as a creation with only one kind of rules: substitution rules. Being such a minimalist creation, complete rewrite.js implementation fits in a bit more than 400 Javascript lines of code.

To get a feeling about rewrite.js appearance, this is a math expression rewriting code in rewrite.js:

    (
        (
            REWRITE
            (
                MATCH
                (VAR <a>)
                (RULE (READ  <a> + <a>) (WRITE 2 * <a>))
            )
            (
                MATCH
                (VAR <a>)
                (RULE (READ  <a> * <a>) (WRITE <a> ^ 2))
            )
        )

        (x + x) * (x + x)
    )

The above example evaluates to:

    ((2 * x) ^ 2)

Rewrite.js is hosted on https://github.com/contrast-zone/rewrite.js with convenient online playground.

Aside from criticism, I'm particularly interested in possible rewrite.js usage ideas.

JIT: it's complimicated

What Exactly do we Mean by JIT Warmup? 2016 but still interesting, I think.

Weird Results. Many benchmarks don’t warmup under the classic model. New goal: Try to understand why we see “weird” results.

Pico Scheme (A purely functional subset of scheme)

The Pico Scheme specification just came out. It is a purely functional subset of Scheme (R7RS-small) that I edited.

1.0 Pico Scheme release

https://github.com/jrincayc/r7rs-pico-spec

Deallocation patterns and linear types (e.g. Rust)

I've been [much] belatedly poking at Rust - particularly the move/borrow model, and I was then thinking about implications for allocation and deallocation performance. Though I haven't looked seriously, a couple of things occurred to me that I haven't seen people talking about. Nothing in any way profound here, but perhaps worth considering for those of you designing or implementing new languages, and it appears to me that this applies to many langugaes using linear type. Rust's reference counted pointers fall outside of what I'm going to describe.

I was pondering lifespans in Rust, thinking that there are probably cases where deallocations in Rust occur later (for naive lifespan reasons) than we might like - at least according to the simple description of lifespans in the specification, which is conservative. Also that the rules for object deallocation have the consequence that the cost of function return becomes hard to understand because it is difficult for a programmer to have good intuitions about deallocation overheads associated with the return. This got me to pondering deallocation and heap fragmentation to try to understand whether that's an actual problem, and then to wonder how many of Rust's allocations can be stack allocated and deallocated in the style of alloca(). So far as I can tell, the answer is most of them. I may have this wrong, but here's the intuition:

  1. In a given function, we know can usually determine statically where an allocated item (more precisely: the associated region) becomes unreachable. This, rather than reverse lexical order, is the point where the region created by the allocation actually dies. Where we can determine this point (and move semantics means we mostly can), we can sort allocations in "reverse region exit" order (last-out-first). This may not be the same as the allocation order, which is the point of interest here.
  2. For objects have statically-known size, this means that we can craft an arrangement of space on the local stack at compile time, such that the statically sized portion of the next-to-die region is always the last (most recent) chunk of storage on the stack.
  3. For regions whose exit order we cannot statically determine (do these exist in Rust?), we place them conservatively as if they will exit last. [This can be improved, but it is correct.] These objects can still have their destructors run in the expected order according to the Rust specification; what's being deferred (so far) is the release of the space.
  4. Some care is needed when a function returns a result in the same region as one (or more) of its arguments. We can (and must) assume that region is still live, but we cannot necessarily tell whether the content of the region has grown - an effect system can mitigate this, and there are other reasons to want an "allocation may happen when you call this" effect.
  5. Where all else fails, we resort to heap allocation, but the reference can still be placed according to order of region death.

If I have this right, region deallocation for statically sized objects is reduced to running destructors (in Rust: drop functions), handling any heap deallocations required to retire references, and then adjusting the stack pointer.

There are practical concerns - many operating systems make naive stack overflow tests that would need to be worked around - but that's neither new nor hard.

The handling of "uncertain death" regions can be improved. Even if we don't know the exact order of region death, we can generally determine some place in the region exit ordering where we know conservatively that a region has become unreachable.

It seems to me that we end up with three kinds of allocation areas:

  1. The stack, where deallocations are simply a stack pointer adjustment.
  2. The "linear object heap", where reachability is still governed in linear type fashion, but the dynamic nature of the allocations may cause objects in different regions to be interleaved. That is: the organization of the linear object heap does not obey a stack discipline.
  3. The "general object heap", which can be reference counted or garbage collected (or both - to eliminate dangling cycles.

The absence of reference counting (or analogous general object heap allocation mechanisms) in a program tells us whether we need to bind the collector into the binary.

For anybody whose thought carefully about implementing linear type, I suspect all of this is pretty obvious. I hadn't gotten quite that far in my thinking about heap management in BitC, though I had certainly been thinking about compiler-driven stack allocation.

Denominated Values - Part numeric and symbolic.

Well as long as "how I spent my summer vacation" is an interesting topic, I've got one.

I've recently been working on/thinking about a programming language type that I haven't seen implemented before. I guess "Denominated values" is the best name for it.

A denominated value is a real number, plus a quasi-symbolic representation of what that number denominates - meters, hectares, seconds, joules, etc etc etc.

The semantics are just what you'd expect: Multiply two denominated values together and you get another denominated value. Divide a length by a time and you get a speed. Divide a speed by a time and you get an acceleration. Multiply a distance times a distance and you get an area. Add two distances and you get a distance. Add a distance to a time and you get an error. Which is the whole point, because the idea is error checking.

Most of the error checks can be free for most compile-able languages - The denomination semantics exist in compilation, but can get boiled out of the executable (except debug versions of the executable). If the program passes denomination semantics during compilation it doesn't need to check again, especially not on every single operation.

I was remembering college physics where misunderstanding the problem often led to doing math and realizing I was trying to do some mathematical operation on values whose denominations were incompatible under that operation. Or when the answer came out with a value of the wrong denomination. Or when I blew a unit conversion.

Of course denominations coming out right is no guarantor of correctness. Consider the popular rule "71 divided by annual percent interest rate equals doubling time." This happens to come out fairly close for small interest rates, but is not a mathematically correct construction and fails hard the further out of its range you get. This is a mistake that someone would make, because they probably learnt it when they were ten and never realized that the mathematical reasoning is fundamentally wrong. This is the kind of mistake we would not want to make.

But it comes out correct in denomination. interest rate is in {thing per time} per {thing}, thus the unit {thing} cancels out and interest rate turns out to be denominated in {inverse time}. So 71 (a unit-less value) is divided by a value in {inverse time} and yields a value denominated in {time}, which is exactly how the answer is interpreted. You can assign this wrong answer value to a denominated variable and never know there was a problem - you print it out in years and it "looks right."

And of course correct operations on denominated values do not necessarily guarantee us correct units. It guarantees us Denominated Values that *CAN* be expressed in the correct units, but are still prone to humorous errors. Even if we have done everything right, some values come out in {inverse time} and can be expressed as interest rates, angular velocities, or frequencies, and whatever units you tell the system to express it in, any inverse-time denominated value will express without an error because those three things are the same fundamental units. They shouldn't be, and I don't really know what to do about that.

Implementation details: The constructor takes real numbers and construct unit-less denominated values, and has a matching 'conversion' that gets a real number if and only if the denominated value being converted is also unit-less.

so you can do things like this.

// implicit promotion of reals to unit-less allows DV*DV implementation to be used producing a DV answer.
// but since INCH is a length, addition will fail giving a NaDV unless foo is also a length.
// it doesn't matter what units were used when foo was assigned; if it's a length, inches will add to it correctly.
answer = 2.0*INCH + foo
...
// implicit conversion from unit-less DVs to reals gives NAN if the argument isn't unit-less.
// It doesn't matter what units were used when any DVs used in the calculation were created;
// if it's a length, this expresses it in inches.
print ("answer is " + answer/INCH + "inches.")

The simple implementation so far is that the number is a 64-bit double and the denomination is an array of 8 bytes. The denominations, and all operations and tests on them, are #ifdef'd to exist in debug mode only. The first byte of denomination is devoted to order-of-ten magnitude, -127 to +127. The remaining 7 bytes are each devoted to the power of one fundamental unit, allowing the fundamental units to have powers from -127 to 127. (I am not using #FF. During implementation it signified overflow).

Addition and Subtraction requires that the unit powers are equal and increase or reduce one of the values by the difference in magnitude before doing the operation. Multiplication and Division adds or subtracts every field of the denomination, including the Magnitude.

Internally values are represented in metric units, with various unit constants like INCH being denominated values representing the metric equivalent of one of that unit. So when someone creates a denominated value using 2*INCH they get the same denominated value as if they'd created it using 5.08*CENTIMETER. And there are also a lot of unit constants that use more than one of the denomination powers (WATT has a +1 power for joules and a -1 power for seconds for instance).

I've created accessors that get the numeric value as a standard real number regardless of whether the denomination is unit-less, or which get the denomination as a 64-bit integer. I'm not sure this was a good idea. I seem to get sucked into using them to do "denomination casts" whose value or hazard I can't really guess.

XML feed