Writing a new language - ideas?

Hi, I want to write a new language. However, I need something new to justify writing the language. I imagined a system where all code would be transformed by bytecode via macros. Turns out that has been done. I imagined objects that were made by copying - turns out that has been done in prototype OOP. I imagined where functions and objects were the same thing - turns out that is just first class functions. Can people of this forum suggest anything? I would prefer ideas that target the below key principles:

1. makes code shorter. I want to be able to describe client-server programs with ten lines or less. I want to be able to write gui code with just a data structure and telling it to represent itself. How? Standard templates? Code generation of gui, protocols? Transparent functions and data?

2. makes code less error prone. Typos and Logic errors should be the only thing to exist. How can we enforce less errors by making errors impossible to express?

3. makes intention/purpose over implementation. In a perfect world, programming would be done with strong AI assistants (paid in cpu hours). Without AI what methods would we use intention to guide (potentially automatic) implementation?

4. makes reading, sharing, and reusing over writing. OOP was touted as the solution to sharing. Is it?

5. unifying

What I am not interested is static/dynamic, typed/untyped, etc debates.
New algorithms are pretty much irrelevant to this since they just do things better.
Nor another algebra that unifies logic into a computable framework.
Make sure your ideas do not require O(e^N) time :D

Cheers!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

1. makes code shorter. I

1. makes code shorter. I want to be able to describe client-server programs with ten lines or less. I want to be able to write gui code with just a data structure and telling it to represent itself. How? Standard templates? Code generation of gui, protocols? Transparent functions and data?

I recall someone affiliated with the Fonc Project describing a very short, very declarative implementation of a TCP stack.

2. makes code less error prone. Typos and Logic errors should be the only thing to exist. How can we enforce less errors by making errors impossible to express?

You said you weren't interested in the dynamic/static debate, but you might want to reconsider that in light of that comment. That's all I'll say on that matter :)

3. makes intention/purpose over implementation. In a perfect world, programming would be done with strong AI assistants (paid in cpu hours). Without AI what methods would we use intention to guide (potentially automatic) implementation?

I'm not going to comment on AI assistants, but since you have brought up "intention", you might want to take a look at Intentional software and what the people at Intentsoft.com are doing.

4. makes reading, sharing, and reusing over writing. OOP was touted as the solution to sharing. Is it?

Brad Cox who invented the Objective-C used to talk about pluggable "software circuits" and the market that would develop because of them, but we seem to be still stuck in "libraries" mode.

I'll never be a PLT - I'll leave that to the brains around here. But maybe I would consider myself a SET (Software Engineering Theorist). With that, I consider tooling to be an underappreciated topic (well in some quarters of software development). I think we have a lot to learn from the Smalltalk style of development. I have great hopes for what the guys and gals at Intentsoft are doing with Intentional programming. I think the people doing the FONC project have some interesting ideas.

But I also believe that programming languages and tooling are intertwined and that you really have to think about them together.

I want to be able to write

I want to be able to write gui code with just a data structure and telling it to represent itself.

Make sure you can write your data structures declaratively, and that each object can have both attributes and children. (ala XML, but preferably with richer attributes than just strings)

Without AI what methods would we use intention to guide (potentially automatic) implementation?

Take a look at Epigram. It has such strict types that it can guide you a lot, because there is very little a programmer could actually do at each point in the program.

Make sure you can write your

Make sure you can write your data structures declaratively, and that each object can have both attributes and children. (ala XML, but preferably with richer attributes than just strings)

If you need inspiration, at the moment, XUL is a good example of XML-based declarative UI, JavaFX a good example of a DSL for declarative UI, while Chris King's Functional Reactive UI toolkit for OCaml is a good example of declarative UI without the need for a specific language.

Take a look at Epigram. It has such strict types that it can guide you a lot, because there is very little a programmer could actually do at each point in the program.

mmmhhhh.... you're giving me ideas for next year's lectures

If you haven't designed a

If you haven't designed a language before, its helpful to get some experience re-inventing wheels. Learning doesn't have to be "justified." Also, you should study existing languages and steal ideas whenever possible. As a theiving language designer, I started with a linking language (Jiazzi) based on something baked (program units) in a new environment (Java). I moved up to other languages like SuperGlue, this time based on FRP, Self, Prolog, and others. Now I'm looking at natural language and trying to steal ideas from AppleScript.

My advice: start simple and move up. Anything you do has probably been done before in some form (probably by the 70s, maybe even in the 60s), but what you can contribute is understanding and progressively more refined implementations and applications.

There is no escape...

...from most of the things that you say you want to ignore. In particular, a new language that wants not to be a re-tread of the various iconic languages supporting concepts that you've mentioned, limits errors to those of logic alone as much as possible, and is as declarative as it sounds like you wish it to be, is going to be faced foursquare with exactly "an algebra that unifies logic into a computable framework" and "static/dynamic, typed/untyped, etc debates," at least if you're serious about maximizing your chances for a successful design (and you're still not likely to design a language that becomes popular). You can no more ignore the Curry-de Bruijn-Howard Isomorphism than you can the law of gravity given the goals that you've set for yourself.

With all of that said, if I were you, I would be paying a lot of attention to the space of type-theory-based interactive theorem provers such as Coq and the efforts behind using it as a programming language in its own right, as well as using it to develop certified compilers for powerful programming languages. From the other end, I would pay attention to dependently-typed programming languages such as the aforementioned Epigram, Sage, Cayenne... as well as how far you can go down this road without dependent types (e.g. lightweight static capabilities). Finally, since you seem to be interested in declarative languages, you'll want to pay attention to Mercury and, I would think, Oz. I think Ben Moseley's work on Functional Relational Programming is important. There's a ton of work to do around Bell-LaPadula and type theory, certified program verifiers, information flow security, and proof-carrying code. A language that makes it easy to express "Secure Property Titles With Owner Authority" and "A Formal Language for Analyzing Contracts," both by Nick Szabo, would be an eminently worthwhile goal.

So as you can see, there's a lot to do—the above just reflect my own off-the-cuff thoughts and therefore are not even representative, let alone comprehensive—but the unifying themes behind them all are precisely a lot of deep type theory, logic, etc. that you ignore, IMHO, at your peril.

You can no more ignore the

You can no more ignore the Curry-de Bruijn-Howard Isomorphism than you can the law of gravity given the goals that you've set for yourself.

I recommend a gravity agnostic approach to language design as examples involving the space shuttle are especially important for languages touting theorem proving.

Gravity is everywhere

Space shuttles that ignore gravity tend to not launch in the first place. Even after they're in space gravitational effects (including microgravity) are fundamental to their behavior. Perhaps the analogy still fits.

Yes, I saw it coming as soon

Yes, I saw it coming as soon as I clicked 'post comment' (I had in mind a simple model of gravity when I added the bit about the space shuttle). I'll stand by my assertion, though, that languages should be gravity agnostic :).

Gravity: not only a good idea, it's the law

Languages should be at least physics believers, to correctly handle units and avoid feets + meters = profit!! bugs that plague our space expeditions.

physical types?

it wouldn't be bad to have classes for money and money/second, and money/hour, meters, meter/hour, money/meter, etc. totally off topic though

The dirty laundry of theory

Paul has it right that a competent language designer cannot escape PLT theory and the related maths--especially if someone wants to do something that is new, useful, and rigorous. (If all you are doing is copying an existing language and making subtle changes, then you can ignore theory FTMP...but why bother, other than for educational reasons?) Likewise, if you want to build automobiles and have them be decent designs, you will need to have a good understanding of (or have people working for you who do) mechanical engineering, chemistry, thermodynamics, materials science and metallurgy, electronics and electronic engineering--and that's just the powertrain.

However--there is room for advancement in liberating the user from understanding these things. 100 years ago, driving a car invaribly meant knowing how to repair one. Breakdowns were frequent, professional mechanics were still unknown, and even operation of an automobile meant knowing how to manually control a choke, change gears without a modern crutch, manually crank an engine, etc. Nowadays, many people drive cars who have never touched a either manual transmission or a torque wrench--and do so competently.

Likewise with PLT--reducing the barrier to entry to programming is one useful advancement in the art. (It's not the only one, of course--increasing the productivity of professional programmers of various skill levels are also important; and these may be solved by different languages). Many of the PLTs we talk about here still contain artifacts of obscure fields of math (obscure meaning "not part of the standard undergraduate curriculum")--monads being one example; much of modern type theory being another--and as such are not very accessible to the "grunt" programmer. In the research phase, that's not a problem--but in production, that is a problem.

So if your intent is to make it easier for programmers to program, that's an excellent goal. But a programming language (and its implementations and tools) are ultimately engineering artifacts, whose design demands at least some familiarity with the underyling theory. Not necessarily expertese--knowledge of a particular problem domain or industrial requirement may be just as important, and many "benevolent dictators" of common industrial languages aren't PLT theorists.

Stealing and reusing

Stealing and reusing is good, obviously the language would have to be realistic in the sense that it would look something that has been made before. And since I'm not a genius, I doubt I could invent something entirely new anyway - so I'm asking for some ideas here. But I feel just programming a language for the sake of learning is educational, not time efficient, so at least if there is a new feature of some sort it would be more interesting. Also, just because i want a new feature, does not mean that the language won't be somehow functional/imperative/whatever is possible.

David and Paul, thank you for the links to various projects, it will help me learn more about the languages.

But I also believe that programming languages and tooling are intertwined and that you really have to think about them together.

In terms of tools, I also believe that tools are important. I was considering example-based programming, where you could take data and do your specific operation, and it would record it down like a macro, except with potential for guessing where loops should be. Navigation tools are quite important too as code bases become big. So perhaps a good idea on that front is good.

However, most tools I feel add to the language, rather than form an integral part of the language. So with a good foundation on language (depending on if I can even get that far), the next step would be tools.

but the unifying themes behind them all are precisely a lot of deep type theory, logic, etc. that you ignore, IMHO, at your peril.

Correctness in the pragmatic sense and mathematical sense are good if you language allows you to express incorrect things. Array overflow is impossible in java. Correctness of result, however, would perhaps be better tested by people using the program, rather than theorem provers. Perhaps in large/multithread/multicomputer situations provable correctness is a big issue.

You can no more ignore the Curry-de Bruijn-Howard Isomorphism than you can the law of gravity given the goals that you've set for yourself.

However, is it possible to box off these difficult to discern issues into modules separate from ordinary code that is declaratively correct? We usually do not consider whether a linked list works or not because we have boxed off all proof that it works away from the user. To the user, as long as they do nice things to a linked list, it will give correct results. The interior designer, or the electrical engineer in a rocket does not care about whether it flies or not - they make their part work - as for whether that needs as much mathematical sophistication, we shall see.
Although typing does not help in terms of logic errors, does it?

Why a new language?

I think what you have to consider first is why you want to build a new language. Too much intellectual resources have been spent on building languages that have never been used by more than a few. A new language is good if it solves problems that existing ones haven't. Otherwise, it is just an example of how people try to show how smart they are and fail. Please tell me what you want to do with your new language.

Solving new problems is

Solving new problems is good, but what if the new language can solve old ones more efficiently, or more conveniently? Optimization might be just as important to a program as solving problems that others don't resolve.

Purpose

Well, the purpose of the language is to make programming easier. So my reasoning is easy=short+reusable+bugfreeautomatically. As for how to achieve short, reusable and bug free code is a mystery to me. Perhaps you are suggesting that targeting general purpose programming is too far of a challenge, and in a small domain I could make such a language.

time and energy

Developing a new language is an interesting thing to do, but make sure you have a lot of time and energy for it:)

yeah

I have one half of a summer left to do it. Originally I was going to go for an entirely rewrite-macro system, where the compiler just read in specifications for a lexer. Then the compiler read in rewrite rules. So writing the compiler could be done in haskell in under 100 lines. Writing the rewrite rules would take a week. So I estimate about a week to get a working (non-optimised) hacky version of that language.

But then i discovered metalua, and other meta-langugaes. I was impressed, and realized that i needed something more challenging, otherwise i would be just rewriting somebody else's code just with another syntactical sugaring.

Start simple

Maybe you are underestimating the effort required. Have you implemented a language before? If not, I suggest trying to implement an interpreter for a Scheme-like language first.

experience

I wrote a c compiler/interpreter in visual basic (compiled to my very simple few instruction set virtual machine), to see if visual basic was a good language or not. What can I say, they are both imperative and therefore isomorphic. It worked in under 3000 lines of code. No optimizations and a very bad runtime O(N^2) - using string manipulations that are not very good. I didn't know about shift-reduce, or any AST business. But that was when I was in high school. Now I'm at uni, I feel like it's time to make something newer. Not java :D

It's very good that you try

It's very good that you try to make a language that is easy to program in. However, you haven't answered my question. What do you want your language to do ?
What kind of problems can it solve? Does it enable users to write robust and extensible code? What is the performance? Is it easy to develop new libraries?
There are many things to consider. I suggest you read The Design and Evolution of C++ by Bjarne Stroustrup. The book explains how C++ has been created and evolved.
Anyway, I think people should spend more time building good libraries for existing languages and less time on making a new language.

Purpose

Good questions. In reverse order, I believe (and argue if these seem not very good ways about it):

1. I do not believe in user-libraries. The language should provide every single library that is necessary. If you develop a new library which provides new powers, then it will be included in the language as a de facto standard. If a new library is developed which does the same thing, but with different algorithms, the new library will be included as an optional implementation - it must retain the same interface. If a new library does the same thing but with different interfaces, then a decision will have to be made of relative merits, and either include both or one has to go. Wouldn't life be easier if there was only one genetic algorithm library (GALib is quite good)

2. Performance is an issue only when you introduce declarative languages. With imperative languages you can tell the run time. With functional languages you have an upper bound. So for the latter two cases, I would like it to be within 3~10 times as fast as c if you use the same algorithm. Memory of the same order.

3.

4. What to do with language: I would prefer writing general purpose language. I had in mind a test of the power of the language with writing a gmail like server, and a library/database web server, and a program that does genetic optimization of stiff integration parameters.

No user libraries?

Are you suggesting then that if a law firm were to write custom functions, or classes, or whatever, for abstracting legal matters or forms--that this should be added to the standard library?

And I thought Java was big... :)

Seriously, though--there are many domain-specific concerns that a general purpose programming language (and its accompanying standard libraries) simply should not touch. Separation of concerns, limitation of scope, and all that.

What is in a library

Well, I don't see why not. Accounting templates are good if you use excel you don't build your own accounting form, when there are plenty of good ones. In fact, I recommend a library has a few reference implementations of large projects that are easy to understand (at the top architecture/functions/whatever level - probably not in the small details).

For a person to write a program, they would use the standard library's reference implementation, and add or subtract from there.

Separation of concerns - the libraries and reference implementations should be governed by such concerns.

Clearly individual business rules could not be implemented. But if you can generalize some common business rules (eg sell at 3 dollars valid today) then it should be included.

Rapid development is not just having magic compiler, frameworking, architecture should be free if you agree with standard library. If you don't agree, submit a petition to insert, modify.

You know what that means?

That means you (and I mean you, the language developer) will spend your time altering the standard library.

Not if only I use the

Not if only I use the language :P

I am guessing probably only a couple of toy languages ever get out far enough to get over a dozen users. I'm not even past the point of designing.

Some ideas

I want to be able to describe client-server programs with ten lines or less.

Your language should be based on signals. A function should not return values, but yield signals. Then programming would be as easy as connecting functions to signals. Example:

receiveMessages()
    => messageReceived(m : CloseMessage) : {
           close(); 
           receiveMessages();
    }
    => messageReceived(m : OpenMessage) : {
           open(); 
           receiveMessages();
    }

Incidentally, the above style can be reused in anything (GUI events, server/client events, exceptions, signals and slots, callbacks, concurrency etc).

I want to be able to write gui code with just a data structure and telling it to represent itself.

You need named arguments and tuples with named members. A gui structure can be expressed nicely with named arguments like this:

(Window(title="hello world", layout=Grid(2)) {
    Button(text="Exit", click=function() {
        exit()
    })
}).do
makes code less error prone. Typos and Logic errors should be the only thing to exist. How can we enforce less errors by making errors impossible to express?

You need to make values as types. For example, an integer variable with range 0 to 100 should be used only with values that are compatible. Example:

var x : 0 .. 100 = 50;
var y : 50 .. 60 = 55;
var z = 0 .. 1000 = 500;

//ok, because y falls in the range of x
x = y;

//error, because z is outside of the range of x
x = z;

By carefully telling the program which values are allowed at each specific place, most errors will be caught at compile-time, rather than at run-time.

Static ifs could be used to promote values from one type to another. Example:

var i : int;
var x : 0 .. 100;

i = input();

if i in 20..80 then
    //we know that 'i' has a good value, therefore the code below is valid
    x = i;
end if
makes intention/purpose over implementation. In a perfect world, programming would be done with strong AI assistants (paid in cpu hours). Without AI what methods would we use intention to guide (potentially automatic) implementation?

You need compile-time code execution that can be used to transform code arbitrarily during a compile (and also do other side effects). For example, you can declare a database schema like this:

var db = database(username="john", password="blabla") {
    table Customers {
        column FirstName : string;
        column LastName : string;
    }
};

Then you could have the following meta code that is executed after a translation unit is compiled and that it transforms the translation unit:

meta {
    function database(AbstractSyntaxTree tree) {
        ///bla bla transform the code
    }
}

In this way, you don't even need macros.

OOP was touted as the solution to sharing. Is it?

OOP has its own problems, one of them being the mixing of subtyping and inheritance. I suggest to separate the two by using type classes / structural subtyping.

makes reading, sharing, and reusing over writing.

1. don't let the user manage libraries and versioning. Make network imports and let the compiler import the appropriate library. Example:

import "www.mycompany.com/MyLanguage/".MyModule;

The compiler will check online if there is a new version of the library upon each compile by talking to the server defined at the import declaration. If there is a new version, then it will be downloaded and used. If not, the compiler will use the old one (if the contents of the new one are not compatible) or declare an error.

This will allow the sharing of code much easier than in any other way.

2. define a standard binary interface. In C++, I can't take code compiled with one compiler and use it with another compiler, which is quite a problem in some cases.

Your ideas

Thanks for the ideas!

Signals of your type is a bit like smalltalk I feel. The plus is transparent concurrency, and distribution.

For your GUI demonstration, What is the difference between the ( and the { brackets? I am thinking you mean that you use curly ones to mean that whatever is contained inside container widgets.

The idea of constrained types is quite interesting. Especially when you consider that many "for x in range(1..100)" type things are actually saying "for x in domain of integers 1..100". Restricted types are an interesting problem - how do you represent them? Is there a general form of restriction on other data structures?

Your testing of the value of "i in 20..80" is what most programmers should do. Perhaps a more better way is that all user input should be validated. Validation should be separate from input. It would be interesting to see if languages can be made where good programming practice was unavoidable.....

I am not sure how your meta schemas is different from macros. Lisp macros work on ASTs, while c macros are a bit more dirty.

What is your difference between classes and structure?

or your GUI demonstration,

or your GUI demonstration, What is the difference between the ( and the { brackets? I am thinking you mean that you use curly ones to mean that whatever is contained inside container widgets.

Parentheses are for function calls.
Curly brackets are for declaring tuples.
It was only a quick and dirty example, but you can do better than that. For example, you can have function calls inside parentheses (something like LISP), and therefore the creation of trees (e.g. widget trees) could become:

(window text="Hello World" layout=(grid 2) children={
    (button text="Quit" click = \(close))
})

Please note the \ notation for lambda (stolen from Haskell).

Restricted types are an interesting problem - how do you represent them? Is there a general form of restriction on other data structures?

Basically, in current mainstream languages, every variable has an implicit set of allowed values. It is implicit in the sense that only the programmer knows about this set, the compiler does not. So all you have to do is attach a set of allowed values on each instance, then check if computations return results within the constraints of each type.

You don't have to actually execute the program's computations, only create new sets of values for each result according to the expressions of the computation. For example, if your range is 0 to 100, then adding 1 to it results in the range 1 to 101. I am no mathematician, but I am sure there is a relevant mathematical theory about this.

I am not sure how your meta schemas is different from macros. Lisp macros work on ASTs, while c macros are a bit more dirty.

The difference is that while LISP has a special construct named 'macro', a meta-language does not.

What is your difference between classes and structure?

OOP has two different things mixed together: implementation inheritance (Button inherits from Widget) and interface inheritance (Button is a Comparable).

Implementation inheritance is easy to understand: a component is made out of another component plus a few new bits.

Subtype inheritance means that a class signature is cast is stone: unless a Button inherits from Comparable, for example, it can not be used where a Comparable is required.

More often than usual, programmers have to deal with classes that do not implement the right interfaces and so they need to make adapter classes.

One way to avoid this is to not have classes inherit from interfaces, but instantiate interfaces separate from classes. This is called 'structural subtyping', i.e. if an object quacks like a duck and walks like a duck, then the object is a duck for all intentions and purposes (it's also called 'duck typing').

Guidelines for language design

This post gives some basic guidelines to help when designing a new language. The rest of the discussion is interesting too.

Writing short code

1. makes code shorter. I want to be able to describe client-server programs with ten lines or less.

This distributed contract net protocol is written in Oz in three lines. The basic idea is to use functional combinators (Fold, Map, etc.) as concurrency patterns (see sections 5.4.3 and 5.2.1 in CTM). This is completely natural with dataflow variables. It works in a distributed setting if the language supports network-transparent distribution.

I want to be able to write gui code with just a data structure and telling it to represent itself. How? Standard templates? Code generation of gui, protocols? Transparent functions and data?

The QTk tool in Oz lets you write GUI code as a nested record, cleanly separating the declarative and procedural parts of a user interface. We measure 1/4 to 1/3 the number of lines of code of procedural toolkits (like Tk itself). As a bonus, since GUIs are data structures they can be computed. See chapter 10 of CTM or the QTk documentation in Mozart.

It works in a distributed

It works in a distributed setting if the language supports network-transparent distribution.

How are network failures handled? Mark Miller tends to favour so-called "semi-transparent" distribution, since network failure modes are so different from local failures.

Handling network failures

We find that the best way to handle network failures is by using fault streams. See Raphael Collet's Ph.D. thesis The Limits of Network Transparency in a Distributed Programming Language. We are very close (a few days!) to releasing the new version 1.4.0 of Mozart that completely implements the model explained in this thesis. This model can be seen as a generalization of Erlang's model for failure suspicions (temporary failures).

The model is consistent with Mark Miller's definition of semi-transparent distribution: "Any correct program written for a bunch of objects distributed over the network will remain correct when the objects in question are thrown together in the same address space" (his definition).

Thanks, looks like an

Thanks, looks like an interesting paper. I'll give it a careful read. :-)