LtU Forum

Histogram: You have to know the past to understand the present by Tomas Petricek

Histogram: You have to know the past to understand the present by Tomas Petricek, University of Kent

Programs are created through a variety of interactions. A programmer might write some code, run it interactively to check whether it works, use copy and paste, apply a refactoring or choose an item from an auto-complete list. Programming research often forgets about these and represents programs as the resulting text. Consequently, thinking about such interactions is often out of scope. This essay shifts focus from programs to a more interesting question of programming.

We represent programs as lists of interactions such as triggering an auto-complete and choosing an option, declaring a value, introducing a variable or evaluating a piece of code. We explore a number of consequences of this way of thinking about programs. First, if we create functions by writing concrete code using a sample input and applying a refactoring, we do not lose the sample input and can use it later for debugging. Second, if we treat executing code interactively as an interaction and store the results, we can later use this information to give more precise suggestions in auto-complete. Third, by moving away from a textual representation, we can display the same program as text, but also in a view inspired by spreadsheets. Fourth, we can let programmers create programs by directly interacting with live previews as those interactions can be recorded and as a part of program history.

We discuss the key ideas through examples in a simple programming environment for data exploration. Our focus in this essay is more on principles than on providing fine tuned user experience. We keep our environment more explicit, especially when this reveals what is happening behind the scenes. We aim to show that seeing programs as lists of interactions is a powerful change of perspective that can help us build better programming systems with novel features that make programming easier and more accessible. The data exploration environment in this interactive essay may not yet be that, but it gives a glimpse of the future.

Rope

This is my very first post on this site. Also this is my first post regarding my new PL idea: Rope

First, an introduction to me (in a hopefully non-narcissistic way):
Skip over this if you want to avoid hearing someone talk about themselves
My name is Jocob (it was a typo from Jacob; we didn't learn about it until I was 19 and by then I was a legal adult so I had the choice to keep it). I've been programming for about 5 years. First Excel macros, then C# in Unity, and from there I decided to get a career in the field. Now I'm a "DevOps Engineer" (I don't know why they prepended a paradigm onto a job title). Professionally, I work mainly with Python, HTML/JavaScript/CSS, PowerShell, Bash, and increasingly GoLang and Ruby. Personally, I like to use C# both for Windows Apps and for Unity. I'm also building a site on AWS using Django/Bootstrap/PostgreSQL.

I have no experience or education designing programming languages.

If I have seen further than others, it is by standing upon the shoulders of giants. - Isaac Newton

I get it, and I agree with it for the most part; why waste your time doing what has already been done? What progress would we make if everyone started from how to make fire and tried making it to the moon? But there is something that I don't agree with. (Not with Newton, but with a secondary application) Should we all only blaze trails where a previous trailblazer left off? My personal experience has taught me that, if you disagree, then you'll learn more and, rarely, discover something amazing. It is with that mentality that I want to introduce "Rope."

Second, an introduction to ROPE:
Skip over this if you get annoyed reading about people's naive hopes and dreams
The idea is simple: I want a "language to create languages." Here are the main features I'm aiming for:

  • Doesn't inherit from Fortran: I want the language itself to help anyone understand exactly how they're instructing a machine to operate, so I want it to inherit directly from machine language. The dream is that someone without a computer science degree could understand how ones and zeros translate to a UI on a monitor if they took the time to read the source code.
  • Intention over implementation: Each basic function will have an input/output criteria with a standard implementation. But if someone can propose a better, faster implementation, then that function can be replaced without impacting existing code. This would mean, ideally, that people would never need to worry about losing functionality or rewriting their code in the event of an update. They could just update, and things would work better.
  • Both a Script and a Markup Language: While someone would be able to write any application they want in Rope, they would primarily use it to define a higher level language. I'd hope to eventually create ports for other major languages that would allow Rope to import and export to them. Subsequently, this would enable a "translate" feature that recreates scripts in another language. (Substituting raw Rope syntax wherever there is no translation, which would indicate to the Rope developer that the port needs to be improved.
  • Prepackaged IDE, Compiler, and Interpreter: Rope would come with everything you need to write and play right out of the box. I know personally that the faster I can see the output of my code, the quicker I can learn it. The compiler would be able to produce an executable file if you write as a script, or it could produce another compiler if you write as markup. The interpreter behaves the same as the compiler, but could also read and run ported languages as long as their is a markup file exists. This would, theoretically, allow for non-interpreted languages to be used live scripts.
  • Visually incarnatable: Because the core of the language would be simple, and to help make the language easier to use in other Countries, Rope would be able to be represented entirely visually.
  • Built-in basic kernel: To help with being cross-platform, and because Rope aims to be able to directly bridge human and machine, Rope would have an ability to interact directly with hardware. Markups would be written to allow it to read device drivers for various platforms. Ideally this would mean that it could use *Nix drivers and Windows drivers at the same time.
  • Community linked at every aspect: Each implementation, for example the "Output to Console" implementation, would have both an "intention ID" and an "implementation ID." The IDE would integrate with an online resource that allows people to discuss and collaborate on each piece of the language itself. You could, in this example, click on the "print" (or whatever it will be called) function, and see the discussion and versioning of that core function. It would allow the language to grow and be discussed much more rapidly than through mediums like GitHub, because it could be done in the middle of coding. (Though that may be distracting...) There would also be a like/dislike voting system for everything, as well as "Implementation Alternates" thought could be seamlessly swapped out during development and then automatically aggregate bench-marking from all willing participants.
  • Syntax to define syntax: This is a necessary part of the "language to create languages" goal, but there's no reason why that couldn't be used within the code itself to make things easier. For example, someone could want one script with lots of commenting, and another separate script with lots of string literals. Rope would allow them to configure each script at the start to handle those syntactical preferences.

That's all I got for now. But I'm sure I'll come up with more. The lofty goal behind the "language to create languages" philosophy is that, at the start, everyone tries to make their PL perfectly suited for its purpose. But as we've seen, languages keep expanding and descendants get spawned as a result of either success or failure. So if we had to do it all over again, wouldn't we want a language that intended to have descendants. Wouldn't it have been nice to have a language that lived for its children? That's the idea I want to explore.

Oh and as a final thought. The origin of the name is this:
Recall
Observe
Prove
Express
In terms of how information is handled, it essentially means that it can pull from either memory or the outside world, process it, and then output it back into the world. The name is meant to represent every state of information handling, because that's essentially what a computer is, and be a snappy acronym as well. In regards why "create" isn't included in my theory of information handling, there's actually a philosophical reason for that. I don't think anything in human memory comes from absolute nothingness, and that creativity is probably an illusion.

NDArray/multi-columnar with efficient CRUD operations?

I'm in the hunt for material or implementations of NDArray and/or columnar structures with not-bad support for CRUD operations (insert-update-delete) that could work in-memory.

I'm aware of kdb+ but my understanding is only for big append-only loads then calculations on it.

Currently, I have for my own little relational lang (http://tablam.org) several structures that backed the relations (BTreeMaps - Vectors - Table - Scalars). The main is a table:

https://bitbucket.org/tablam/tablam/src/41b3b0676d6062031998af48b683f8f0062bf279/core/src/types.rs#lines-221

#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
pub struct Table {
pub schema: Schema,
pub data: Vec,
}

and this is ok. But wonder what other options I could explore. (The allure for me in use just a NDArray-like container is that I can reduce my implementations to just 2 BtreeMaps - NDArray)

Session Types for Purely Functional Process Networks

Session types greatly augment purely functional programming. Session types enable pure functions to directly model process networks and effects.

We can adapt session types to pure functions by first reorganizing function calls of form `(A, B, C) -> (X, Y)` to a form that uses distinct, labeled input and output parameters, like `fn(in A, in B, in C, out X, out Y)` which we can easily rewrite to a sequential session type `?A ?B ?C !X !Y`. I'm assuming the parameters are easily distinguished, either by distinct data type or by augmenting with named parameters (such as `a:int` vs `b:int`).

Sequential session types conveniently represent that intermediate outputs are available before all inputs are provided. A simple reordering to `?A !X ?B ?C !Y` would correspond to a conventional function type `A -> (X, (B, C) -> Y)`. However, in contrast with the conventional type, the session type is recognizably a subtype of `?A ?B ?C !X !Y` or even of `!X ?A !Y ?B !D`. Thus, we can eliminate most adapter code to relax ordering in the caller. Further, a functional language designed for session types can easily support a direct programming style, e.g. based on imperative reads and writes to single-assignment parameters, thus avoiding the noise of continuation-passing style.

Sequential sessions already demonstrate a trivial model of interaction: the caller can observe the intermediate output `X` before computing inputs `B` and `C`. We can also model 'plain old structured data' types as unidirectional sessions, e.g. a type `!x:int !y:int !z:int` is essentially a record value with labels x, y, z.

Session type systems usually also support choice and recursion.

We can adapt 'choice' to pure functions by assigning a choice-label to a choice parameter and this label determines which subset of choice-specific parameters we'll use. A simple example:

type IF = &{ add: ?x:int ?y:int !r:int
           | negate: ?x:int !r:int
           }

With this definition in scope, the session type `?method:IF` could represent an external choice of 'method'. The choice-label `add` or `negate` might be assigned to implicit parameter `method.case`, and the label chosen will determine whether we further use parameters `in method.add.x : int`, `in method.add.y : int`, and `out method.add.r : int` or `in method.negate.x : int` and `out method.negate.r : int`. This is an exclusive choice, so a compiler could safely 'overlap' memory for these five parameters, similar to a C union. But unlike a conventional union or variant, the choice determines both inputs and outputs. Choice session types can conveniently model object-oriented interfaces or singular request-response interactions.

Aside: Session type systems distinguish external choice (&) vs internal choice (⊕). In the adaptation to functional programming, whether a choice is external or internal is based on whether the 'choice parameter' like 'method' in is input or output. However, it's convenient to represent some choices from the 'external choice' perspective. Thus, use of `&` above allows the type to be syntactic sugar for `{ add: !x:int !y:int ?r:int | negate: !x:int ?r:int }`. When we later provide this type as input, via `?method:IF`, the label is input and all the `!` and `?` types are flipped.

Recursive session types can further augment our functions with unbounded trees or streams of interactions. Conceptually, they allow functions to have an unbounded set of parameters, each with a unique 'path' name. A demand-driven stream type might have a form: `type Stream x = &{quit | more: !hd:x ?tl:Stream x}`. Whether a demand-driven stream has more elements is chosen by the receiver, not the sender. (Session types can also model normal streams, push-back streams, and others.) In context of recursion, our pure function logically has parameters of form `more.tl.more.tl.more.hd`, where `(more.tl)*` may recur in the parameter name an arbitrary number of times.

Use of recursive session types is similar to conventional functional programming with tree-structured data. A compiler or garbage collector can recycle memory for parameters that become irrelevant to further computation. Session types can represent many useful evaluation strategies such as call-by-need or bounded-buffer pushback. Intriguingly, session types can model 'algebraic effects' via recursive streams of request-response choice sessions.

Beyond sequencing, choice, and recursion, we can also extend functional programming with 'concurrent' sessions to represent partitioned data dependencies. For example, with function type `(A,B,C) -> (X,Y,Z)` it's possible to have a data dependency graph of form `(A,B) -> X; (A,C) -> Y; (B,C) -> Z`. It can be convenient to represent this precise data dependency graph in our session type. Fortunately, it's a simple extension to add concurrent types (though concise description, avoiding redundant expression of dependencies like `A`, is non-trivial).

Session types give us a rich model for interaction with pure function calls.

Implicitly, these interactions are between the 'call' and 'caller'. Fortunately, it is not difficult for a session-typed functional programming language to support 'delegation' such that we tunnel handling of interactions to another function call. When we begin to delegate long-lived sessions (e.g. recursive streams) between functions, the program begins to take a form of a 'process network' where pure functions are the processes and delegation models the wiring between them. Use of session types and delegation for purely functional process networks will subsume Kahn Process Networks (KPNs), which are limited to simple streams as the only interaction between processes. With session types, we can effectively model processes that rendezvous, coroutines, processes that have clear bounds on input and output, clear termination behavior.

As a summary, session types for purely functional programming supports:

  • a more convenient alternative to continuation passing style
  • function types able to directly represent object-oriented interfaces
  • a surgically precise alternative to 'call-by-need' vs 'call-by-value'
  • streaming request-response interactions for rendezvous or effects
  • process networks that tunnel interactive sessions between function calls
  • process models and interactions more flexible than Kahn Process Networks
  • opportunity to fuse loops and optimize dataflow within the network
  • type safety, subtyping, and progress guarantees for all of the above

Session types greatly improve this does not compromise functional abstraction or functional purity, except insofar as unbounded interactions with functions are not what we usually imagine from the mathematical connotations of 'function'.

I have not searched hard for prior art on the subject of session types exposing partial evaluation of pure functions as a basis for interaction and deterministic concurrency. I would not be surprised to discover all this is known in obscure corners of academia. But to me, who has recently 'discovered' this combination, this seems like one of those 'obvious in hindsight' features with an enormous return on investment, which all new functional programming languages should be seriously pursuing.

The Way-Too-Early announce: Ecstasy

It's been four years in development, and it'll still be another few years before it's ready to be used in the real world, but we've taken the wraps off the development of the Ecstasy language and pushed a public repo.

The "real" documentation including a full language spec isn't done, but the blog is meant to be a good introduction to the ideas, and relatively easy to chew: https://xtclang.blogspot.com/2016/11/welcome-to-ecstasy-language-first.html

The partial language spec, a text form of the BNF, and other docs are here: https://github.com/xtclang/xvm/tree/master/doc

At a few hundred thousand lines of code so far in the prototype, it's not a small project, but if you're interested in the workings of it: https://github.com/xtclang/xvm

Building this has been some of the most fun that I've ever had as a developer. Finally getting to _use_ it and see it actually running is one of those "sunlit summits".

Peace,

Cameron.

CFP: PLOS '19: 10th Workshop on Programming Languages and Operating Systems

Those of you who apply advanced PL ideas to operating systems might be interested in the upcoming PLOS 2019 workshop. See the abbreviated CFP below, and please consider submitting a short paper!

Thanks! ---

Eric.

(ABBREVIATED) CALL FOR PAPERS

Tenth Workshop on Programming Languages and Operating Systems (PLOS 2019)

October 27, 2019

Huntsville, Ontario, Canada
Sponsored by ACM SIGOPS
In conjunction with SOSP 2019

Paper submission deadline: August 9, 2019


Historically, operating system development and programming language development went hand-in-hand. Today, although the systems community at large retains an iron grip on C, many people continue to explore novel approaches to OS construction based on new programming language ideas. This workshop will bring together researchers and developers from the programming language and operating system domains to discuss recent work at the intersection of these fields. It will be a platform for discussing new visions, challenges, experiences, problems, and solutions arising from the application of advanced programming and software engineering concepts to operating systems construction, and vice versa.

Please visit the Web site for more info: http://plos-workshop.org/2019/

Applied Category Theory and Categorical Query Language

I'd like to announce both:

1) the open-source categorical query language CQL and visual IDE:

https://www.categoricaldata.net

based on David Spivak's functorial data migration sketch from his book/course "7 sketches on compositionality:

https://ocw.mit.edu/courses/mathematics/18-s097-applied-category-theory-january-iap-2019/index.htm

2) funding opportunities of up to $1.5M USD for start-ups who want to apply category theory (and especially, the other 6 sketches) to business problems:

https://conexus.ai/ventures

Previous discussion on Hacker News and Haskell-Reddit:

https://news.ycombinator.com/item?id=20376325#20377243
https://www.reddit.com/r/haskell/comments/bs5aiu/mit_spinout_offering_up_to_15m_to_ventures_in/

A production rule system matching algorithm with match cost logarithmic wrt the size of the knowledge base (rules + facts)

I propose an algorithm to match facts against rule sets (or against fact sets) which has a time complexity logarithmic with respect to the size of the knowledge base, measured in terms of rules + facts in working memory. As far as I am aware, the current state of the art is polynomial, with RETE derivatives.

There is a description of this algorithm in a reference implementation I have published, in Python (sorry), whose docs can be checked here:

syntreenet in gitlab

Thanks!

A pretty printing algorithm

A pretty printing algorithm which works for strict languages with an example implementation in ocaml.

For a relational language, how treat KeyValues, Streams in relations with operators that add or remove "columns"

I'm building a relational language where I wish to treat everything as a relation. The main point is to provide a common core of relational operators (aka: Queries) that work across all. A "relation" here is a table with a header of fields, columns and rows.

I have a container that match directly this, that internally is an array:



id name age
1 Jhon 18

Now, I have other containers that are Key/Value like a hashmap and BTreeMap, and single value like Generators, ranges and vectors. Only generators, as being read-only forward-only can be say to output a fixed schema.

You can say a hashmap, for example, is a relation that look like:



Key Value
1 Jhon

And a vector one like



it
1
2
3

With all of them, I wanna provide the same operations like:

table ?where #age = 18
kv    ?where #key =1 
array ?where #it = 2

Now my dilema is what to do for some operations that add or remove columns, like joins and projection:

table ?select *, #age + 10 as older
kv    ?select *, #value |> String.Upper as name
array ?select *, it * 10 as calc

What could be the best option here?

1- Say that kv/array have fixed schemas and can't grow or shrink their headers

2- Say a KV only have fixed the KEY and the value header is flattened, so it become:

[key, value, name]

The problem is not what OUTPUT. I could just output tuples. Is how REPRESENT the relation if it be more than the KV.

So, If I have 10 columns, what is the best option to put inside a KV? Consider that the relational model allow to rename columns, and add and remove them.

Another problem is that KV like Btree allow to model... trees. If I think instead of Btree as a index behind a table, the language lost the ability of "naturally" model trees. BTW, I wanna to have AGDTs.

P.D: Other potential question is: What kind of container could work alike "table" but also alike hashmaps/trees? How model a type that allow several "backends"?

P.D: Originally here: https://www.reddit.com/r/ProgrammingLanguages/comments/awkdrb/for_a_relational_language_how_treat_kv_and/ but not good answers... so I ask here with a bit of more details.

XML feed