Basic building blocks of a programming language

I've tried to understand category theory, but while I can't say I've got the mathematical skills to understand it, I think I kind of grasp what it intends to do. It is basically a way of building bigger things out of smaller things as flexibly as possible.

Designing my language I get inspired by this thought, and started to think along those lines. This means trying to realize what are the basic building blocks from which to build larger blocks, i.e. from where do I start?

In an attempt to figure this out, I thought about what languages typically tries to do. What I found so far are:

1. Calculate data, i.e. from data produce some other data, i.e. functions
2. Organize data into hierarchy (lists, structs, maps, let's call these types just to have some name)
3. Find patterns in data

The idea here is to make these elements as versatile as possible, so they can be combined without restriction as long as it is reasonable. Allow functions to generate other functions, hierarchies and/or patterns. Allow hierarchies to contain functions, other hierarchies and/or patterns. Allow patterns to describe functions, hierarchies and/or other patterns.

First of all, do you agree that these three could be used as basic building blocks, or is there something missing or wrong?

Secondly, the pattern. I can see patterns used in two that seems like distinctly different ways. One is that you write a template like code, which you could see as a patter
n. You insert a few values and out comes a type, a function, or something. Another way of using it would be to say this is an expected pattern with some variables in it, th
en apply data to it, and if it matches the pattern you get the value of the variables the pattern contained.

Perhaps those two cases of patterns should be named differently? Any thoughts on this?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Category theory to the rescue?

Seems like no-one understood my question, given the absence of any answers. Not sure I did either, but I have given this some more thoughts. Trying to figure out what I really want.

I've noticed that it seems popular to think category theory when it comes to type system. I expect this is where I should go as well.

But I'm a bit confused. For example, type theory seems to be something else (but related?) than category theory. And it seems the move is towards category theory rather.

I also read about simple-sub, a simplified MLSub. It uses "negative" (constrained) types for inputs and "positive" (union) types for output. For me it feels like a work-around, but I might be wrong here... Is there a good reason for this or is this because of short-comings of the type system used?

<Removed>

Copy due to server error

Give it a few days...

It seems that the site has slowed down a lot in the last few years. Post something, then don't worry too much about non-response until it's been several days. I think a lot of regulars may not get back here more than once or twice a week anymore.

For me the building blocks of a programming language have always been expressed by the question 'well what can I do with it?' and that seems to relate more to basic sets of libraries for using the machine. Can I put up a native GUI? Can I do math using functions that someone has to actually understand to program in machine language? Can I format, display, and print reports that aren't just text? Is there a standard way to save and restore data structures that are structured with dynamic values like data addresses that we don't want to treat as constants because we want to be able to rebuild the structure even if some of those locations are in use? Is there a standard way to format and interpret binary datagrams using common data representations that don't happen to be the underlying binary representations for our local data?

Can I read and write rich-text/html/pdf/whatever? How about matrix regressions and other machine learning? Can we handle bignums? Can we handle bignums without being likely to accidentally consume all memory and crash in what should have been a simple calculation? Do "numbers" have strange conversion rules that make no sense at all unless you know the binary implementation of each numeric representation?

And these questions are all about the very prosaic pragmatics that I think of when I haven't even begun to think 'category theory.'

And then I think of other pragmatic issues.

Is looping and flow of control practical, syntactically comprehensible, and flexible? Does a tail recursion lead to a stack overflow?

How do variable scope and lifetime work?

Is the language multithreaded, and if so does the programmer have to deal with locks and mutexes?

Is there automatic garbage collection, and if so are there any timeliness guarantees (eg, when it is IMPORTANT that sensitive information is erased in a timely way can the GC deal with the requirement)?

When types are derived from other types, is it by inheritance, composition, interface conformance, interface composition, template derivation, fixed-point operations ... or what?

Does the language have exceptions, and if so has it found any possible way to handle them gracefully? If not, does it make any other way available to handle conditions the implementor may have thought someone might possibly have wanted to sometimes throw an exception for? Overreliance on "wobbly" overambitious exception semantics is a particular bane of recent programming languages, often making it impossible to detect and handle errors correctly.

And so on. All pragmatics all the time.

And as you've doubtless already realized, also all category theory all the time. The handling each and all of these pragmatic issues lives at the intersection of three or four branches of category theory.

The important part of language design, IMO, is that a programming language revolves around addressing the pragmatic issues. The category theory is just a way of organizing them and trying to work out ways to address them all that don't introduce inconsistencies with each other. if you don't have the pragmatic issues in the top of your mind first, you won't have any way to judge the worthiness of your 'building blocks' to see if you are proposing a category system capable of dealing with them.

Why category theory

The pragmatic problems I want to solve include (in addition to what you mentioned) be able to read a data file whose format can be described in the language so the syntax of the file gets automatically checked when reading. It also includes being able to give support for error messages where you could say that "you forgot to set environment variable x", not because you hard-coded that knowledge, but because your code could realize that we couldn't get a value and this value was supposed to be in an environment variable. A kind of dependency tracking done in runtime.

Another area I feel is important is how to live with external dependencies, not just other modules written in the same language, but other binaries and libraries that your application might be dependent on. How do I describe and handle those dependencies? How can I write a proper error message saying what I miss? And how to know which version works and which doesn't, not just be hard-coded knowledge about version numbers, but by knowing whether the binary/library provides what I need?

And in the end I want something that has a very simple and easy to understand foundation which can be composed into solving complex problems like the above.

Right now I'm trying to figure out what kind of type system I would need to be able to solve the above kind of problems. I think category theory might help me out here, but I currently know too little about it to really know. But it seems at least to have the right kind of thinking to at least give me inspiration for how to design the type system.

And yes, I've looked a bit on Haskell, but have found the language too intimidating. It has a way to complex syntax, and has quirks that I feel are hard to swallow. But I haven't done any serious code in it so it might just be me not knowing it well enough...