LtU Forum

Do names and symbols really imply semantics? If so what to do about it?

Some languages, like APL, are written in Martian script. Their proponents insist that a set of characters humans have never seen before in their lives have intrinsic, unambiguous semantic meanings which can be inferred directly from their shapes.

I don't think that is true. Not to humans, anyway. Martians probably see differently.

Some languages have strong requirements about the semantics of operators, based on the familiar semantics and relationships of those operators. For example there was a beta release of Pascal 7 (from Borland, around 1995 I think) that allowed people to overload operators, but warned that the compiler would simplify expressions according to the precedence rules the language set for them and the distributive, associative, etc. properties implied by those operators before any definition overloads were looked up. If an operation was commutative, the compiler was allowed to reorder its arguments arbitrarily. If distributive expressions like (a*b)+(a*c) would be reduced to a*(b+c) before looking up the operator overloads. If you defined any two relational operators, the compiler would automatically define the rest using the identity axioms. You were not allowed to define three or more relational operators. Etc.

This strong assumption that the semantics of the overloads must follow, in most respects, the semantics of the operators whose names they were using, really upset a lot of people who wanted to use '+' to concatenate strings or wanted to use '<=' and '=>' (which is how Pascal spelled 'equal-or-greater' as some kind of redirection operators. A lot of it (but not all of it) got changed between beta and release.

I was never really convinced that changing it was the right thing. It seemed perfectly reasonable to me that these familiar symbols should be constrained to have the familiar semantic properties that we infer when we're looking at an expression built of them. I thought people who wanted string-concatenation operator or a redirection operators should be using different names for their operators, like '$+' or '|>'. Sadly this was impossible as there was no way to define new operator symbols. This deficiency remains true in almost all languages that allow operator overloading.

In Scheme there is a strong assumption/tradition that any variable whose name ends in the character '?' will be bound to a procedure that returns a boolean value. The same idea in common lisp is associated with the trailing character 'p'.

The tradition in scheme goes on that any variable whose name ends in '!' is bound to a procedure that has a side effect. Many schemes will produce a warning if any such variable is ever bound to anything else. And some consider it an error if a side effect is produced by any procedure not so named, because the name is considered to be an important warning to the programmer that a side effect is possible.

Scheme used to have 'indeterminate digits' in numbers, written '#', so for example 123# denoted 'one thousand two hundred and thirty-something,' an inexact integer. This got scrapped once it became clear that implementors had no interest in keeping track of how many significant figures of decimal accuracy a calculation represented. They were only using 'inexact' to mean 'ieee754 floating-point hardware representation' and were openly hostile to the notion that anyone might hope for it to mean anything more. Many regarded it as a contradiction in terms that something could be both 'integer?' and 'inexact?' And I think in the current scheme standard it may in fact be a contradiction in terms. But getting rid of it meant abandoning the possibility of tracking significant figures of accuracy. So that character used to have a semantic meaning and purpose, but nobody valued that purpose.

Early basics had 'type sigils' so that the programmer (and the interpreter) could know the type of a variable just by looking at the name. $foo always referred to a string, .foo always referred to a floating-point number, #foo always referred to a binary value, etc.

People made an awful lot of fun of those type sigils in old basics. Until the same people turned around and started using Hungarian notation to keep track of the types of their variables in C++. Because keeping track of the type was HARD, and they wanted to be able to see what type something was by looking at it. So they defined their pszName and hwndInteractionWindow and their dwIDnumber and so on and didn't think about basic's type sigils at all because this, after all, was something different.

And after all these examples of naming semantics. How much semantics is it reasonable to expect to be able to infer from syntax? How much is it reasonable for the compiler to enforce, based on the name alone? And what relationship does programmers liking it have to helping them produce good code?

Monads vs Applicative Functors.

This article (https://jobjo.github.io/2019/05/19/applicative-parsing.html) got me thinking about the structural problems with monads, namely that `bind: m a -> (a -> m b) -> m b` is opaque in its second argument. One thing that I have become increasingly concerned with over the past few years is the separation of data and code, and this article made it clear to me that monads have a problem in this respect, and at least in the case of parsers result in mixing implementation details with the application of the parser. By combining an approach focused on the initial algebra for parsers, and implementing combinators as an applicative functor (and not a monad), the "application" of the parser is reduced to a static data structure, and the `map` functions only generate output.

Using continuation parsing to encode the existential types from the GADT (see https://unsafe-perform.io/posts/2020-02-21-existential-quantification-in-typescript and function overloading, I was able to port the applicative parser to TypeScript, which I have made available here https://github.com/keean/applicative_parser and as a Deno module here https://deno.land/x/applicative_parser. I also added a couple of useful helper functions based on Parsimmon's `seq` and and `seqMap` which use a little conditional-type magic to make working without operator overloading easier. The result is we can write a simple parser in TypeScript (for floats) like this:

const digit = OneOf('0123456789');
const float1 = seqMap((a, b) => [...a, b], many1(digit), OneOf('.'));
const float2 = seqMap((a, b) => [...a, ...b], float1, many1(digit));
const float3 = seqMap((a, b, c) => [...a, b, ...c], choice(float2, many1(digit)), OneOf('e'), many1(digit));
export const float: Parser<number> = FMap(a => parseFloat(a.join('')), choice(float3, float2, float1));

Monads are of course useful, and things aren't really so black and white, but it seems interesting to consider what can be gained from the separation of data and code possible if we limit ourselves to applicative functors. The benefits include being able to have a complete implementation for the parser initial algebra in 50ish lines of code, that you can build your parser application from the basic 9 combinators and applications can remain the same whilst you can swap different implementations (with backtracking, error reporting, greater efficiency etc).

It also seems interesting to consider the initial algebra for parsers in the same way we consider Codd's five primitive operators for relational algebra. The ones used in the article are Fail/Empty/OneOf/Return/Forget/Map/Product/Either/Fix, is this minimal? There are clearly alternative formulations, for example `OneOf` isn't the only possibility, but it has the advantage that it allows the set of accepted symbols to be generated for a parser application.

In the context of compiler architecture, it seems interesting that each pass of the compiler can be implemented as a recursive descent of the abstract syntax tree with case matching on node type. With this approach the parser also becomes a recursive descent with case matching over the syntax definition of the language, which seems an elegant solution.

Thinking about programming languages themselves, as we can thread state through an applicative functor, and we can assign to scoped variables (map), it seems interesting to think about what a functional language that uses applicative functors to manage state and IO would look like? Obviously existing programming languages can implement applicative functors, in the same way you can write object-oriented code in procedural languages, but they don't focus on it. Likewise there is nothing stopping someone writing monadic code, but you would not want to encourage it. What would a language designed to encourage the programmer to think in an applicative, initial-algebraic way look like?

CFP: PLOS '21: 11th Workshop on Programming Languages and Operating Systems

Those of you who apply advanced PL ideas to operating systems might be interested in the upcoming PLOS 2021 workshop. See the abbreviated CFP below, and please consider submitting a short paper!

Thanks! ---

Eric.

(ABBREVIATED) CALL FOR PAPERS

11th Workshop on Programming Languages and Operating Systems (PLOS 2021)

October 25, 2021

An Online Workshop
Sponsored by ACM SIGOPS
In conjunction with SOSP 2021

Paper submission deadline: August 6, 2021


Historically, operating system development and programming language development went hand-in-hand. Today, although the systems community at large retains an iron grip on C, many people continue to explore novel approaches to OS construction based on new programming language ideas. This workshop will bring together researchers and developers from the programming language and operating system domains to discuss recent work at the intersection of these fields. It will be a platform for discussing new visions, challenges, experiences, problems, and solutions arising from the application of advanced programming and software engineering concepts to operating systems construction, and vice versa.

Please visit the Web site for more info: https://plos-workshop.org/2021/

What is a type?

There's nominal typing.

There's structural typing.

But what if an object's type is simply the aggregate of its behavior?

I'm curious if there are other type systems that work in this same manner. Comments and criticism welcome.

counterexamples.org

Counterexamples in Type Systems

The "counterexamples" here are programs that go wrong in ways that should be impossible: corrupt memory in Rust, produce a ClassCastException in cast-free Java, segfault in Haskell, and so on. This book is a collection of such counterexamples, each with some explanation of what went wrong and references to the languages or systems in which the problem occurred.

Emacs modes: what is it?

I've been using Emacs as my OS since time started and I've always taken its programming model as granted, part of the nature. However, now I'm looking back at it when I'm trying to design an editor/OS based on S-exp rather than text, I found I understand Emacs very poorly.

So here's the question:
What exactly are mode, buffer local variables, hooks and advices?

Hooks and advices look like AOP. However, emacs hooks and advices usually make heavy use of buffer local variables. Lots of them also interact with modes by looking at the mode variable. So I don't think AOP captures the full picture.

Sure, mode looks like context-oriented programming (see ContextL LtU thread ). However I never see mode associated with dynamic scope, they just got turned on or off (for a particular buffer). Is this "resembling ContextL" impression just another instance of fitting a too-general concept into a much more specific (but not understood) concept?

And finally, let me give the context that all those question arises: I want to clean up the Emacs model, and apply it to a S-exp (tree) based editor. So the practical question is: Can we find a cleaner/more elegant version of the Emacs model, and generalize it to tree document structure? Will we have node-local variables, node-local modes, etc? If so, how will all those "attachments" interact between, say, parents and children nodes?

Objective-S

Morally kinda similar to ArchJava, apparently.

Objective-S is an architecture-oriented programming language based loosely on Smalltalk and Objective-C. It currently runs on macOS, iOS and Linux, the latter using GNUstep. By allowing general architectures, Objective-S is the first general purpose programming language. What we currently call general purpose languages are actually domain specific languages for the domain of algorithms. Objective-S includes an Objective-C compatible runtime model, but using a much simpler and consistent Smalltalk-based syntax. Unlike Smalltalk, Objective-S has syntax for defining classes and so can be file-based and is "vi-hackable".

Paper: Tyranny of call-return

Paper: Procedure Calls Are the Assembly Language of Software Interconnection: Connectors Deserve First-Class Status

Java / CPython Language Bridge

I've implemented support for two-way compatibility between Java and Python/CPython, meaning that Java can import standard CPython modules and use code as if it were Java, and Python can import Java classes and use them as if they were Python classes.

Imports in both languages are done with standard import statements.

How It Works

Java code imported into Python is subjected to reflection and then Python wrapper classes are generated using the Python API that provide runtime translations between the two language runtimes. A native Python module is provided that provides the "magic import" functionality from Java to Python (more info).

Python code imported into Java is introspected using the Python C API and then bytecode is generated for Java wrapper classes that provide a similar runtime translation as the imports in the other direction. A custom compiler based on javax.tools is provided (QoreJavaCompiler) as well as runtime dynamic code generation support for wrapper APIs in a custom class loader (QoreURLClassLoader) - more info.

This is all accomplished using a third language called Qore that provides the functionality of a "language bridge" and also manages the references to data in each separate runtime. Because Qore has a deterministic garbage collector, strong references to Python and Java data are managed in Qore and released when there are no more valid references to the data (in Qore - therefore various tricks are used to manage references transparently including sing thread-local data; there is also the possibility of using custom reference handlers to maintain the validity of objects using different language runtimes and garbage collection implementations). A high level description of the approach is here.

While support for importing Java into Python has been released, importing Python into Java will be released in April 2021 but is currently working and stable.

The idea is to facilitate the mixed use of enterprise technologies (Java) and AI / data science technologies (Python).

All of the source code required has been released under permissive open source licenses (MIT).

I hope it could be interesting and useful for someone - happy to provide more information if there's interest.

John Shutt, creator of Kernel and an LtU regular, dies at 56

Apparently he was also a wikinews contributor as they have an obituary: Wikinews mourns loss of volunteer John Shutt

On Friday, Wikinews learned Dr John Nathan Shutt, a long-time contributor to both Wikinews and Wikibooks, died on Wednesday. Editing under the name Pi zero, he was at the time the top contributor to Wikinews by edit count, and came third on English Wikibooks. Dr Shutt was 56 years old.

Edit: John's post tracking page

Racket is ‘Scheme all the way down’ with v8 on Chez Scheme

Chez Scheme’s (nanopass) compiler produces native code x86+x86-64, ARM (incl Raspberry Pi to the Apple M1).

Racket on Chez status

Racket v8 release

XML feed