Refining Structured Type System

This is my first more serious paper on Structured Type System.

[Abstract]
...
As every theory needs some syntax form to express its elements, a road to a theory about theories leads through a syntax defining land, so structured type system, in the first place, provides a flexible generalized text parser that builds up internal abstract syntax trees (AST) from input data. The other aspect of theory about theories inevitably covers the meaning of input data. This is called semantics, and this is the point where structured type system provides a possibility to define deeper connections between syntactic elements of AST-s. For this purpose, structured type system uses a kind of functions known from functional programming paradigm. These functions are able to process any data corpus, being natural or artificial language translation, which in turn happens to be just enough for running any complexity task used to analyze existing and calculate new data from an input.
...

In short, we use BNF-ish grammars as types for function parameters and function results. Some nice constructions can be made by combining grammars and functions. One of the most important properties of structured type system is its ability to additionally extend grammars outside the grammars definitions, all based on function result types. It is fairly simple: where a certain type of expression is expected, there a grammar that results with the same type can be used, and there goes syntax extensibility. Conveniently, we can combine grammar definitions and their inputs in the same source code file.

I was hoping to get some feedback and critics from this community before attempting to get more publicity to the paper. This is an important milestone to me and I want to thank You all for being so inspirational community during my research.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Nice

It's very interesting and I want to mine it for ideas on extensible syntax. One thing I wonder about is the interoperability between different syntaxes and the introduction of "higher order" notations. For example, imagine I would like to share the notion of BinaryOp between boolean algebra and arithmetic expressions, and I want to introduce shorthand that applies to all binary expressions, like slice notation, e.g. (< 2). Can that be encoded in a natural way? Or even more challenging, could you introduce a shorthand for introducing new binary operators that allows you to name their precedence relative to other operators?

Higher order notations

could be managed either by (1) dependent terms (aka dependent types), either by (2) metaprogramming that is handled in a natural way of thinking. In the case of metaprogramming, after applying parameters to a function, the function could return a case-expression controlled parts or wholes of sequences, alternations, implications, or even another functions. It seems that metaprogramming takes a full swing with structured type system.

Precedence of operators

Precedence of operators is somewhat complex matter. It is meant to be handled from the starting first syntax definition and taking care on further syntax extensions. Let's see an example of simple math calculator:

@Parsed (
    Sum <- (
        (
            Add <- (
                Left <- @Sum,
                "+",
                Right <- @Fact
            )
        ) => @DefaultMath (@Add.Left + @Add.Right) |
        Fact <- (
            (
                Mul <- (
                    Left <- @Fact,
                    "*",
                    Right <- @Primary
                )
            ) => @DefaultMath (@Mul.Left * Mul.Right) |
            Primary <- (
                Integer <- @Int
            )
        )
    )
)


Now we want to extend this syntax by exponential operator. To control its precedence, we have to pick symbols of the depth of our interest. If we want it to have the highest precedence, we have to put it between symbols "@Sum.Fact.Primary" and "@Sum.Fact.Primary.Integer":

@Parsed (
    (
        Pow <- (
            Left <- @Sum.Fact.Primary,
            "^",
            Right <- @Sum.Fact.Primary.Integer
        )
    ) => @DefaultMath (@Pow.Left ^ @Pow.Right)
)


If we want the same precedence as multiplication, we put it between symbols "@Sum.Fact" and "@Sum.Fact.Primary":

@Parsed (
    (
        Pow <- (
            Left <- @Sum.Fact,
            "^",
            Right <- @Sum.Fact.Primary
        )
    ) => @DefaultMath (@Pow.Left ^ @Pow.Right)
)


If we want the same operator precedence as addition, we put it between symbols "@Sum" and "@Sum.Fact":

@Parsed (
    (
        Pow <- (
            Left <- @Sum,
            "^",
            Right <- @Sum.Fact
        )
    ) => @DefaultMath (@Pow.Left ^ @Pow.Right)
)


But, if we want the least precedence, have to write a wrapper above "@Sum":

@Parsed (
    Exp <- (
        (
            Pow <- (
                Left <- @Exp,
                "^",
                Right <- @Sum
            )
        ) => @DefaultMath (@Pow.Left ^ @Pow.Right) |
        Sum <- @Sum
    )
)


In all cases, we can call "@Sum (1+2*3^4)".

Numeric Precedence

Personally I prefer to define the precedence of an operator as a number, which allows the precedence of any operator to be changed independently of other operators. This can be achieved with something like a Pratt parser.

Metaprogramming

I'd have to check it with metaprogramming. Theoretically, it should be possible to generically construct a compound term with this or that precedence of its subterms, as a result of a function which would take the subterms and their precedence numbers as parameters.

A shortcut to switching precedence order of binary operations

Here it is:

(
    List <- ( // a typed list
        (Type <- @Any) => (
            Item <- @Type,
            Next <- (@List (@Type) | @Unit) // @Unit is used to denote the end of a list
        )
    ) |

    BinOp <- ( // afterwards we will pass these in a list
        Op <- @String,
        Calc <- (
             (Left <- @Int, Right <- @Int) => @Int             
        ) // and we will pass functions needed for calculations here
    ) |

    Main <- (
        BinOpCalc <- (
            (
                Operators <- @List (@BinOp) // here we take a list of operators
            ) => @Parsed (
                (
                    Iterator <- (
                        (I <- @List) => (
                            @Iff (@I == @Unit,      // if we reached the last item
                                @Int,               // this is recursion terminator
                                Parent <- (         // otherwise, this is basic parser parent-child pattern
                                    (
                                        (
                                            // here we parse specific operator
                                            (Left <- @Parent, @I.Item.Op, Right <- @Child)
                                        ) => (
                                            // and here we calculate it
                                            @I.Item.Calc (@Left, @Right)
                                        )
                                    ) |
                                    Child <- @Iterator (@I.Next) // recursing the iterator
                                )
                            )
                        )
                    )
                ) (@Operators) // we apply the list to iterator
            )
        )
    )
).Main // exposing BinOpCalc to outer space

...

// default math precedence order. Precedence is determined by a position in list
x <- @BinOpCalc (
    ("+", (Left <- @Int, Right <- @Int) => @DefaultMath (@Left + @Right)),
    ("*", (Left <- @Int, Right <- @Int) => @DefaultMath (@Left * @Right)),
    ("^", (Left <- @Int, Right <- @Int) => @DefaultMath (@Left ^ @Right))
)
...
x1 <- @x (1+2*3^4) // x1 now holds the result of 1+(2*(3^4)), that is 163

...

// reversed precedence order
y <- @BinOpCalc (
    ("^", (Left <- @Int, Right <- @Int) => @DefaultMath (@Left ^ @Right)),
    ("*", (Left <- @Int, Right <- @Int) => @DefaultMath (@Left * @Right)),
    ("+", (Left <- @Int, Right <- @Int) => @DefaultMath (@Left + @Right))
)
...
y1 <- @y (1+2*3^4) // y1 now holds the result of ((1+2)*3)^4, that is 6561

...

// mixed precedence order
z <- @BinOpCalc (
    ("^", (Left <- @Int, Right <- @Int) => @DefaultMath (@Left ^ @Right)),
    ("+", (Left <- @Int, Right <- @Int) => @DefaultMath (@Left + @Right)),
    ("*", (Left <- @Int, Right <- @Int) => @DefaultMath (@Left * @Right))
)
...
z1 <- @z (1+2*3^4) // z1 now holds the result of (1+(2*3))^4, that is 2401

Same precedence

What about operator like "+" and "-" that normally have the same precedence as binary operators?

It is fairly simple

It is fairly simple to do it. In the "basic parser parent-child pattern" part (see the commented code above), we have to generically unfold elements that have the same operator precedence to an alternation representing them, instead of a single element between Parent and Child. Of course, to express the same precedence, we have to change input parameters structure also.

For example, here is how to generically construct an alternation from a list:

(
    Iterator <- (
        (I <- @List) => (
            @Iff (@I == @Unit, // remember that @List is terminated by @Unit
                @None, // we terminate the recursion by the empty alternation
                (
                    @I |                 // this is an alternation member
                    @Iterator (@I.Next)  // this is the rest of alternations
                )
            )
        )
    )
) (@SomeList)

Certainly, of the great usability are terms @None (that stands for the empty alternation) and @Unit (that stands for the empty sequence).

@None has these properties:

(X | @None) === X
(@None | X) === X
(X | @None | Y) === (X | Y)


while @Unit has these properties:

(X, @Unit) === X
(@Unit, X) === X
(X, @Unit, Y) === (X, Y)


@None can be used to terminate generic alternations, while @Unit can be used to terminate generic sequences. The third term from a triad is @Any that stands for any concievable term in structured type system, and can be used to pass around unapplied or applied terms (so functions also, as a function is a term).

These are great questions. I think I'll include them in the final version of the paper if I have your permission.

Associativity for Precedence

Personally I find numeric precedence hard work: is 1 most binding or least binding? (Different languages vary.)

Are 9 precedence levels enough? Consider (for example) all the operators introduced in the Lens libraries, or in a typical DSL.

Yes you need to be able to change the precedence for individual operators. Perhaps there are different ways to skin that cat.

Suppose you could declare precedences as Associativity rules, and let the compiler figure it out. Your rules look like:


-- `==>` is the rewrite-as declaration
x + y ^ z * w ==> x + ((y ^ z) * w) ;
x > y && z + w == v -> NOT q ==> ((x > y) && ((z + w) == v)) -> (NOT q)

So those are examples rather than rules?

In that case, you could adopt a convention that all parens in the example should be removable and just write:

precedence example ((x > x) && ((x + x) == x)) -> (NOT x)

An operator precedence technique

In a paper Operator Precedence for Data-Dependent Grammars, found on Iguana parser site, I just saw a nice and clean method for managing operator precedence when dealing with data-dependent grammars (as they call it). Translated to structured type system, we get:

(
    Exp <- (
        (p <- @Int) => @Case (
            (@p >= 3) => (@Exp (3), ('+' | '-'), @Exp (2)) | 
            (@p >= 2) => (@Exp (2), ('*' | '/'), @Exp (1)) | 
            (@p >= 1) => (@Exp (1), ('^' | '√'), @Exp (0)) | 
            (@p >= 0) => /[0-9]+/
        )
    )
) (3)


I never doubted that human ingenuity has no limits, and this is no exception. Ali Afroozeh and Anastasia Izmaylova, thank You :)

Shorthand

Shorthand that applies to all binary expressions? I'm not sure what you mean.

Sections of operators

In Haskell, you can write (+ 1) for (\x -> x+1) or (< 1) for (\x -> x < 1). Essentially it partially applies the operator to a single operand, producing a unary function. This notation is preferably something you'd specify you'd specify once for all Binary Operators.

Slice

Here it is:

(
    Op <- (
        Plus <-  "+" |
        Times <- "*" |
        Power <- "^"
    ) |

    OpImplementation <- (
        (Left <- @Int, In <- @Op, Right <- @Int) => (
            (
                @Op.Plus  => @DefaultMath (@Left + @Right) |
                @Op.Times => @DefaultMath (@Left * @Right) |
                @Op.Power => @DefaultMath (@Left ^ @Right)
            ) (@In)
        )
    ) |

    Main <- (
        Slice <- (
            @Parsed (
                Op <- @OpImplementation.In,
                Right <- @OpImplementation.Right,
            ) => (
                (Apply <- @Int) => @OpImplementation (@Apply, @Op, @Right)
            )
        )
    )
).Main
...
x <- @Slice (+3)     // x now holds (Apply <- @Int) => @OpImplementation (@Apply, "+", 3)
...
y <- @x (10)         // y now holds 13
...
z <- @Slice (*3) (4) // z now holds 12


I left out formalities about handling optional whitespace.

type specific languages

From your description, I am reminded of Jonathon Aldrich's work on the Wyvern language regarding type specific languages.

I generally like the idea of using the type system to guide interpretation of a program, except for the part where it doesn't easily work with 'interpretation' insofar as we need a static type checker and to process a bunch of grammars. My own approach to language extension has taken a decidedly different direction - focusing on projectional editors and editable views. But I'll take a deeper gander at your work, and I should probably review Wyvern's again too.

Graphical interfaces

Yes, I remember your idea of graphical interfaces to a source code, Dmbarbour, and I liked it very much. There is a big potential in that idea. What we have in the computer industry today are hard-coded solutions of wisiwig editors. Imagine if the users could be ones who define such interfaces that could extend to other areas than wisiwig editing.

[Edit]
Each function that could have its own customized graphical interface would be a lot easier learning material. It would be a huge step in programming science.

Abstract Categorial Grammars

If you haven't done it yet, you might be interested in checking out, also:

Towards Abstract Categorial Grammars

Reducing the number of primitive operators

I've managed to express syntax building operator "<-" by function operator "=>" on which is immediately applied its left side! The type system then constructs a sequence flattened to terminal leafs of the whole expression. It turns out that type checking system concludes left function sides from those leafs, playing a role of a BNF-ish parser.

This is the "<-" definition as non-primitive operator:

@Parsed (
    (A => @Symbol) (@A),
    '<-',
    (B => @Any) (@B)
) => @Parsed (
    (@A => @B) (@A)
)

Conclusion: Immediately applying all left sides of any function system transforms the function system into a parser system.

Suggestions?

I'm extending the paper to include some serious examples. I'm thinking of implementing some axiomatic set theory, but I'm not sure which one is the most recent. Simplicity also matters, and I would like to take some optimal least amount of definition rules, achieving the most complete realization.

Is Zermelo–Fraenkel set theory enough to get mathematicians interested, or does anyone have other suggestions?