How to determine your new language is perfect?

I've come up with some interesting ideas about a new programming language. Since I joined this noble society I've realized I wasn't the only one with ideas, not even the only one with my ideas... So apparently, I need to take it one step further: How to combine all those fancy ideas to a useful programming language?

Since I've got some ideas for this too, my current problem is more: How do I know I've got it right? I need some help here. I suspect more people than I have delved into this topic, but the impression I've got is that it tends to be more on the depth rather than broadly. I need both. I want a language that is good overall, supporting all kinds of programming styles and designs, but I also want it to be really good at something. So how do I determine the strengths and weaknesses of my new shiny language?

After pondering a while on this, I've divided this problem into two categories:
a) What programming styles does it support? E.g. functional programming, imperative programming, concurrent programming, ...
b) How easy does it solve different kinds of problems? E.g. implementing a controller scenario, server/client messaging, game AI, ...

What I mostly need is a number of problems that can be implemented in very few lines of code in some language (picking the strongest language for the problem in question), and trying to get as few but diverse set of such problems. Then I could take my language and do some test implementations for each kind of problem and see how elegantly it solves the problem.

Some suggestions:
1) Handling dependencies. For a certain output, a certain input is needed on which an action is taken. This is what makefiles does. E.g.:
myapp: source1.o
gcc -o myapp $

Well, maybe above was too trivial, but I think you get the idea...

2) Object repository. A server has a database of "things". You can register things, unregister things, search for a thing based on name or attribute and subscribe on register or unregister events.
No trivial source code example comes to my mind.

Uhm, that was about as far as my imagination took me right know. I might come up with more given some more time...

Anyone having ideas on this?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

standard libraries

Sure, the PLEAC list (Programming Language Examples Alike Cookbook) is a good list of programs PLEAC.

If you want something more biased towards functional languages a good list is the 99 prolog problems haskell list, you can find an updated list for prolog. The Euler problems are done in an almost every language, and are good because the obvious elegant solution is often far too slow.

The computer language benchmark problems are a good set for comparing performance. Again because you have about 100 language solutions with their scores, and source so you can retest the winners on your hardware. This is particularly good because you'll see if the elegant solutions are terrible

If you want your language to be able to do lower level stuff, the C standard library is a well documented and well understood list. Along the same lines the old fashioned Unix command line utilities are good ones to test. For example, can you language actually read and update the system clock or is just wrapping libraries?

I could probably come up with more if I kept going ... is this what you were looking for. for (b)?

I had something on higher

I had something on higher level in mind. But using your links, I might actually get a starter. Or what do you think about this (first two copied from my first post):

b.1) Dependency handling a'la make. How to express dependencies between entities in the program, and let compiler/interpreter figure out the correct order. I'd like to see more complex cases than make can handle easily, e.g. having multiple outputs for multiple inputs.
b.2) Object repository/name server. Should show how to handle stateful remote communication. Also show how interfaces are shared over networks, how version control of interface works, how subscriptions and events are handled in a potentially lagging and unstable network.
b.3) String handling. Could also include special but important cases such as path handling.
b.4) Algorithms. How easy it is to write number crunching algorithms that are intended to be scalable on multiple cores.
b.5) Simple user interface implementation. The idea here is to show how easy it is to get event handling and machinery/interface separation.

Seems like above suggestions are a little more complex than I initially thought I'd like to have. But the alternative seems to have a lot of small examples instead. I think having fewer larger might give better indication of the expressiveness of the language, and could also include more complex concerns. E.g., how to handle errors/exceptions over network?

If above examples are distributed both on SMP as well as multiple nodes in a network should make it interesting. I guess what is left is to try to make a more formal definition of above scenarios, and then try to have some example languages to implement them in.

How to objectively measure

Most of those would be features of libraries or tools. Most languages don't include that sort of things at all. Further I'm not sure how you objectively measure.

Is make worse or better than ant? How do you compare to build systems that include source code control? Which sorts of features are useful? For example Java and Lisp have canonical editors, how valuable is that?

Objective measurements

For my own need I do not need object measurements. I first want a language that _feels_ right. Then I need to check whether it has some serious weaknesses that actually will make it pretty useless. Last, but not least, I'd need to verify and measure it.

So what I was thinking of was to make a number of examples that you could write as applications. If I then use my newly invented language and write those applications, I will get a feel of whether it works or not. Calculating the number of lines it consists of isn't my main goal, though it would be nice to see how large the code got. It's more a "how easy it was to do" factor that is most important.

From that perspective I'd say make and ant would be a tie when dependencies was implemented, but one of them might be stronger when it comes to other applications. Maybe make would fail miserably when trying to do the object repository. Just a guess, but it might...

But you do have a point. How do you value the IDE? Surely, when having a new programming language, there will probably not be a very good IDE around for it. And I could imagine a language which would be pretty useless when writing code using Emacs, but that would really shine when Eclipse is used. Seems harder to imagine the opposite case, but I guess that would be possible too.

Well, I guess I can't have it all at once...

Subjective is the realm of

Subjective is the realm of design, objective is the realm of science. Generally, objective analysis comes after subjective design, though object analysis feedback into future subjective designs by shaping your own biases.

Try and experiment. Don't spend so much time discussing what you want to do here, just come back when you've done it and let us critique it.

Soft Advice

Pick a standard you can be happy with measuring yourself against, be able to understand it, and understand why it works for you at the present moment. Crystallize whatever beliefs you have in building the system. Don't change those beliefs unless they are proven wrong, such as being unable to implement a feature efficiently.

Document what didn't work so that when somebody asks you why something failed you don't look like an idiot stuttering for answers. It's okay to fail, but being 'perfect' really means defending yourself and not looking like a moron.

And what to do if you don't have any beliefs?

What most people need is someone to tell them what to do. It is cheaper and better to adopt a false religion than to remain a skeptical atheist, seeking after truth oneself. Of course, it is better and cheaper still to adopt a reasonable religion. -- Phil Greenspun

Easy answer

If you don't have a clear story for which problems your new language will solve better than the competition, then there is no point in designing/implementing the language. If you _do_ have a clear story, then that should suggest examples directly.

Real software projects with users are much more convincing than anything like you've suggested here.

yes and no

there's the part of the game where you are trying to start off in a new direction. your advice sounds reasonable there, that it should be based on real problems.

but then, there's the part of the game where you want to sanity check what you are doing, and to see how widely applicable it is, or in other words where it gets itself tangled in knots. "i've invented class based oo / actors / rewrite rules / and it will obviate the need for anything else for eternity! ... oh except for all those cases where it actually kinda sucks to use. drat." i hazard to guess that those are times when the OP's approach as i read you characterizing it makes sense.

This was what I was after

This was what I was after: Making a sanity check of my new language ideas. The point here isn't to make sure that my language is perfect for everything. I know that will not happen. But, I'd like to make a language being strong at something (my case, concurrency and distribution), but without totally sacrificing useability for other areas.

My experience is that if you do a stress test of some design ideas meant to solve a certain problem, throwing other problems at the ideas will tell you how universal that idea is. This sometimes leads to ideas that are both more universal and better at solving the original problem.

I disagree. Sometimes

I disagree. Sometimes different doesn't start out as better, but can lead to better later on. Even a failed project at least gives us a better understanding of the language design space.

But it really must be different. If you can't be better, might as well be bizarre. Better and bizarre is the best combination of course, because we learn new things AND we get a better language to work with.

perfect is a tough nut

Instead of perfect, you probably want "suitable for X" for many values of X. It's pretty easy to demonstrate "superior at X" also means "inferior at Y" for some value of Y, since strength in one area nearly always means weakness in another. Focus in one place means absence of focus some other place. But you can game the problem by making Y something you have little desire to do. However, high odds exist someone else really wants to do Y, even if you don't.

I admire rash questions like yours. But "perfect" leaves you wide open for criticism, and you might be ignored more than you want. I bet "effective" would have yielded more responses.

Your language, or your description of it, needs some model of time characterizing when things happen. A timeless model is rather limiting. Since a language will get mapped to some processor execution model at runtime, an explicit model of signal propagation might improve clarity.

My work leaves me highly biased towards thinking in terms of signal propagation, so I don't know how to be objective about it. But you can only change so many bits and bytes at a time during various windows of opportunity during execution. So data structures of any size only reach target states as time, conflicts, and errors permit. Your language ought to characterize when known states are known to occur, and how they can be examined or tested in some empirical sense. Or rather, maybe only the meta-language you use to describe things will need to do that.

Notation you choose can make various details more or less clear. Unless your language scheme is so fluid it can re-arrange itself to be more efficient in some problem at hand, it must be more verbose for some problems if tuned to be concise for others (following from information theory).

Were you looking for ideas as example problems in which you hoped a language was effective?

Feel free to describe your interesting ideas for a new programming language, unless you want to avoid feedback until you refine issues on your own.

I had the same question earlier

A couple years ago I posted a similar question, here. Almost all the responses there were good. What I ended up doing is using examples from SICP and other texts and translating them to my language.

Is it Natural

By which I mean, there are really only two criteria:

First, can you just sit down and write the code you want?

Second, can you simply browse existing code and understand it?

Criterion One explains why dynamic untyped languages are popular. Criterion Two explains why CPAN made Perl popular.

I have no idea why C or C++ are popular since they fail both these criteria, perhaps it is because of prior torture they're simply the most understood evil.

Why C is popular

I have no idea why C or C++ are popular since they fail both these criteria, perhaps it is because of prior torture they're simply the most understood evil.

C is a portable and higher level form of assembly language. It lets you write almost anything you would write in assembly more easily than you would in assembly (your first criteria). It is much easier to read than assembly as far as the abstractions, while anyone who knows assembly can almost do the compilation in their head (your second criteria).

I'm not sure why you find it a mystery.

C = portable assembly?

I've heard that description of C a number of times, however I've never been able to make the connection. In assembly you fetch values into registers from specific memory locations to do primitive operations on them, and push the results back out to memory. The only flow control is (conditional) jump (goto), code and data can be intermixed (allowing for frequent use of self modifying code), and there are practically no data types. Whereas in C, although it has a goto it is seldom used, data is typed and not accessed directly (except when using pointers), and code / data are completely separate. Is it pointers that make C a "high level assembler"? Or that plus the fact that most of the statements can be directly translated into specific sequences of assembly code (when not applying optimizations)?

I haven't seen the specifics detailed anywhere; maybe if I write my own C compiler it will become clear?

pointers

Well first off in practice everything in C is done with pointers. The notation encourages you to deal with data structures at the pointer level.

Further frequently void pointers are used to get around the type system. So not uncommonly you are dealing with data structures (or functions) as X byes of untyped. Void pointers are used on functions to create polymorphic behavior. For example of both the standard-C library of quicksort is defined to act on void pointer structures and then you pass in data sizes and comparison functions.

void quicksort(void * base, size_t num_elements, size_t element_size,
int (*comparer)(const void *, const void *));

Expanding on that example you see that function is taking code as data, i.e. functions as arguments. So the separation between code and data is not so absolute. An again this is right out of the standard library I'm not using some obscure feature here. While in theory you can write C like pascal and just use an infrequent pointer, pretty much no one does that.

As far as assembly and control structures, assembly has both loops and subroutines. The most common control structure is JSR (BAL on IBM mainframes) Jump to Subroutine which allows for the RTN (return) and uses an instruction stack. Not much different than functions in any higher level language. Better assemblies like 8086 allow for more sophisticated control, to reduce the amount of JSR->RTN->JMP sequences using subtractive subroutine calls (essentially branching with a variable offset) kind of like low level case statements but more powerful.

Now with all that behind us...

Or that plus the fact that most of the statements can be directly translated into specific sequences of assembly code (when not applying optimizations)?

Yes. That's what I meant. You can see the assembly implementation of C code in your head.

Mysetry

Ah, JeffB I believe you're right. I find it a mystery because for the last 3 decades I've tried to distance myself from such a low level view of computing. Using Ocaml I've been able to write better programs without such considerations and wonder why others have remained stuck in the dark ages.

My guess is that programmers are particularly conservative, especially in a work environment and worse, the industry as a whole is burdened by an unwarranted inertia due to a false belief that much old code is better repaired and extended than replaced by new code: industry believes the differential pain involved is lower than a whole new kind of pain.

It's not true! Look for example at how easy it is to work with lists in Ocaml compared to the huge number of (repeatedly re-invented) data structures and routines in C.

The truth is that industry has daily and weekly progress targets and can't afford the luxury of year long rewrites of millions of lines of code, even though it would pay off in the longer run.

C++ is successful because it provided an amortised upgrade path.

Now can you explain why Java is successful?

System's programming

Glad you found that insight helpful.

Using Ocaml I've been able to write better programs without such considerations and wonder why others have remained stuck in the dark ages.

I use Haskell and Perl primarily. I agree with you but I do know C and I appreciate the advantages of it. I think primarily the advantage of C is this:

Quite often there are algorithms of the form
A*n*log(n) + B*n + C for processes vs.
D*n^2 + E*n + F

Where A, B and C are much larger, like 100x larger than D, E and F. From a computer science standpoint that doesn't matter. But quite often in real life n is the range 3-11 or so and that tradeoff is worth making. C keeps the developer's focus on looking for those sorts of efficiencies. In practice C code is often 2-3x as fast as languages like Java, and something 100x faster than high level languages.

To pick a few personal examples:


  • For example a decade ago I wanted to directly manipulate hardware, and not use libraries. I wanted to test security software and so I wanted to be able to create all sorts of TCP/IP packets that were malformed in various ways. If you want to overrun buffers on purpose, low level languages are terrific.
  • I had a project where I needed to get an extra 7% off the disk. So I wrote low level routines, no filesystem and asynchronous I/O. I got the extra minor speed boost at the cost of having everything be custom.
  • I wanted a modified UDP cache that did some semi-processing so made intelligent filtering choices for huge bursts. Again this was all about speed, it has keep up with the hardware it had to be in C.

Now can you explain why Java is successful?

Sure.
1) There was a huge need for a language that could run on multiple hardware or virtual hardware configurations especially for the embedded space. For example, one area for example where Java has been hugely successful is low end phones, so manufacturers can chase the cheapest parts at any given time confident they can create an entire OS / Application stacks in weeks after they pieced together the system.

2) The technology for event driven programming was primarily designed around C++. It was easy to migrate that technology over to Java. At the same time the primary bugs in C++ were pointer errors, which Java eliminated.

3) Sun, Oracle and IBM spent a fortune on Java, subsidizing a free language and creating great tools all available for nothing.

What language wouldn't be successful if you threw billions of dollars of support at it? Now make the transition path smooth and have it genuinely fill an important niche....

portability, community, interop, ...

Ah, JeffB I believe you're right. I find it a mystery because for the last 3 decades I've tried to distance myself from such a low level view of computing. Using Ocaml I've been able to write better programs without such considerations and wonder why others have remained stuck in the dark ages.

The performance implications of "modern" languages have already been discussed, so I'll avoid adding my 2c there. Some other things to consider:

Portability

The only code that you can expect to run on current and future client platforms (Mac, PC, iOS, Android, NaCl) is code that is written in C/C++. When platforms come into existence, they come with gcc/g++. Anything else is cobbled on later as an afterthought and constantly playing catch-up as the platform evolves.

The closest thing to a solution to this is the mono/xamarin stuff. Speaking from personal experience, while it does sort of do what it claims to, it isn't reliable or mature enough to base a business on it.

Community

If I need to talk to some random REST api, chances are I can download a library for Java and C#, but not for Haskell or OCaml. Likewise, if I need to handle XML/JSON or access an HTTP server, the solutions are more usable, flexible, and performant in the "dark ages" languages. Simple list-manipulation is great, but they're not going to implement OAuth or TLS for me.

These "dark ages" platforms have much larger communities of people making code and libraries available than "modern" platforms do. This saves a ton of time when you're actually writing code.

Tooling

IDEs, debuggers, quality vi/emacs modes, profilers, etc. All of these are in much better shape for the "dark ages" languages. I would never want to be in a position where I was debugging an OCaml program remotely on a cell phone or other embedded device, but this is a completely reasonable thing to do in C.

LLVM and portability

Speaking of generic assembler and language portability: LLVM is written for exactly this. It is essentially a portable assembler. If your favorite language can generate LLVM, then you link it to support your favorite hardware. The languages no longer have to support each and every thinkable hardware platforms, you can have other tools that does this final linking instead.

Furthermore it's quite nice with program wide optimization and being able to run it in a virtual machine if you want.

This might actually save the day for niche languages.

LLVM IR is a compiler IR

See rather: LLVM IR is a compiler IR on the llvm-dev mailing-list (and the length discussion following).

In this email, I argue that LLVM IR is a poor system for building a
Platform, by which I mean any system where LLVM IR would be a
format in which programs are stored or transmitted for subsequent
use on multiple underlying architectures.

Now I got a bit disappointed

Now I got a bit disappointed on LLVM. I thought it would be portable.

Thanks for telling me, before I wasted my time on projects assuming LLVM was portable.

It also has pretty poor

It also has pretty poor support for precise GC.

common misconceptions

C is a portable and higher level form of assembly language

This is not true.

  1. What Every C Programmer Should Know About Undefined Behavior 1/3, by LLVM's Chris Lattner. There are two more articles that follow this one.
  2. C is Not Assembly, by James Iry.
  3. Moron Why C is Not Assembly, also by James Iry.

Hmm

I read the articles and he is nitpicking the common wisdom. I agree with his points about unspecified behavior and the lack of direct stack manipulation but basically what his criticism comes down to is:

C is only portable assembly on the types of operations that port between different assemblies.

OK, I'll grant that. C is a level up from Assembly NQA. As I said you can compile the assembly from the C in your head, but it is a compiler not a simple code transformer like going from assembly with variables to hard addresses.

I'll stand by the original while agreeing with everything James Iry wrote in terms of the reasons he rejects the characterization.

tbaa

Have you ever been caught by aliasing-related optimizations? I believed as you did until that happened to me. Actually it has taken a few failures for me to get the C-really-is-not-assembly religion; the latest one was just pointed out to me yesterday.

Nope but then again I'm

Nope but then again I'm generally using C for slower hardware like the networking card or the hard drive and just trying the CPU doing complex work ahead of those systems. I need fast but not perfect from the CPU.

I can see how if it is the CPU you are rushing to get every ounce out of that assembly is better. So good point.

Bias

First, can you just sit down and write the code you want?
Second, can you simply browse existing code and understand it?

I don't deny that these are important criteria for any language that you want to use in a production setting, but this would seem to be biased towards "languages that look and work a lot like what I already know." Always following the principle of least surprise makes it hard to introduce anything truly revolutionary (or even just novel).

As a concrete example, the unification of classes and methods in beta/gbeta is initially bewildering to many programmers, but that unification is part of what allows so much flexibility and power in a language with so few concepts. Scala provides much of this same flexibility with a more familiar/"natural" mental model, but one could argue that some of the simplicity is lost.

I guess this all comes back to the question of "perfect for what?" If the answer is "for broad adoption by practicing programmers in field <X>," then "natural"-ness and the principle of least surprise are probably good things.

Reading C++ code

I think C code can be made pretty unreadable. Anyone knowing C probably knows this too well, for others, check this link for The International Obfuscated C Code Contest.

I've seen a serious attempt at redefining the C++ language so it can express BNF language definitions. It's called Boost::Spirit. It's, uhm, interesting...

After having recovered from the chock, I started thinking in terms of whether the authors actually do have a point to make. There are a number of different ways to express things in text, e.g. mathematics have a wide range of symbols and whatnot to express algorithms, BNF uses its symbols and expressions and so on. The programming language you're using will add its own. If you accept the idea that maybe different parts of the program should use the notation that best solves the specific problem, then how to tie it all together in a coherent program?

Could having a language which permits redefining operators and be able to change syntax at least slightly be the way to go?

I'm not sure if this is something good, but if nothing else, it gets my head spinning for a while...