LtU Forum

Impact of static type systems on productivity of actual programmers: first experiment I've seen documented.

It is disappointing that no particular results positive or negative were observed. But it is gratifying, and long overdue, that someone finally thought the question was important enough to perform an actual experiment to answer it.

http://courses.cs.washington.edu/courses/cse590n/10au/hanenberg-oopsla2010.pdf

DARPA funds $11 million tool that will make coding a lot easier

http://www.engadget.com/2014/11/09/darpa-pliny-coding/

That's a lot of money for a good idea (using large bodies of existing code to complete what the programmer write's) that everyone seems to be working on. Does Rice have some special advantage here that justifies the investment?

Why do we need modules at all?

Post by Joe Armstrong of Erlang fame. Leader:

Why do we need modules at all? This is a brain-dump-stream-of-consciousness-thing. I've been thinking about this for a while. I'm proposing a slightly different way of programming here. The basic idea is:

  • do away with modules
  • all functions have unique distinct names
  • all functions have (lots of) meta data
  • all functions go into a global (searchable) Key-value database
  • we need letrec
  • contribution to open source can be as simple as contributing a single function
  • there are no "open source projects" - only "the open source Key-Value database of all functions"
  • Content is peer reviewed

Why does Erlang have modules? There's a good an bad side to modules. Good: Provides a unit of compilation, a unit of code distribution. unit of code replacement. Bad: It's very difficult to decide which module to put an individual function in. Break encapsulation (see later).

EigenCFA: Accelerating Flow Analysis with GPUs

Here's a very impressive speedup on a program analysis problem by using a GPU.

EigenCFA: Accelerating Flow Analysis with GPUs - Tarun Prabhu, Shreyas Ramalingam, Matthew Might, Mary Hall

We describe, implement and benchmark EigenCFA, an algorithm for accelerating higher-order control-flow analysis (specifically, 0CFA) with a GPU. Ultimately, our program transformations, reductions and optimizations achieve a factor of 72 speedup over an optimized CPU implementation.

We began our investigation with the view that GPUs accelerate high-arithmetic, data-parallel computations with a poor tolerance for branching. Taking that perspective to its limit, we reduced Shivers’s abstract-interpretive 0CFA to an algorithm synthesized from linear-algebra operations. Central to this reduction were “abstract” Church encodings, and encodings of the syntax tree and abstract domains as vectors and matrices.

A straightforward (dense-matrix) implementation of EigenCFA performed slower than a fast CPU implementation. Ultimately, sparse-matrix data structures and operations turned out to be the critical accelerants. Because control-flow graphs are sparse in practice (up to 96% empty), our control-flow matrices are also sparse, giving the sparse matrix operations an overwhelming space and speed advantage.

Whither Flow Analysis?

In the comments to a post about CSE in Guile, NeelK mentions we need more Flow Analysis. The other day I was reading A Flow-Analysis Framework for Realistic Scheme Programs which was done in Scheme48. GHC does some fusiony stuff. What is the state of the art? Who is working on it? When will it come to a compiler / javascript interpreter near me?

How can be a interpreter faster than C (aka: kdb+)

I read a submission on HN that talk about Kdb+.

In the linked page it claim the interpreter is faster than C:

Kdb+ is a testament to the rewards available from finding the right abstractions. K programs routinely outperform hand-coded C. This is of course, impossible, as The Hitchhiker’s Guide to the Galaxy likes to say. K programs are interpreted into C. For every K program there is a C program with exactly the same performance. So how do K programs beat hand-coded C? As Whitney explained at the Royal Society in 2004, “It is a lot easier to find your errors in four lines of code than in four hundred.”

Now I'm a noob about compilers, but I understand that a)Interpreters are easier to code than compilers b)And them are slow, probably very very slow.

How can be faster? How do it?

I don't see how "“It is a lot easier to find your errors in four lines of code than in four hundred.”" can be the explanation.

Opposing Hierarchies of Complexity

Here is a thought for discussion: There are two models in programming

1. The machine/cpu model with pointers, addressing, manual memory allocation, modular arithmetic, floating point numbers etc.

2. The mathematical model with values, integers, real numbers etc.

The code itself built up from machine-code is simplest when closest to (1), and that abstractions are built of ever increasing complexity. For example a GC is a fairly complex piece of code. Adding two stack allocated machine-word ints is not.

The mathematical model is built from type-theory. We can define mathematics constructively. (I am thinking more of Twelf than Coq here). It seems the type system is at its simplest expressing the fundamental concepts of mathematics, but modelling a real CPU is a complex application for type-theory (and mathematics).

So a function with a simple type, has a complex implementation (involving garbage collection and layers of abstraction), and a simple function would have a complex type (possibly something involving linear refinement types, monads etc). You can see this in the difference between a simple functional language which requires a large runtime, and typed assembly language which requires a complex type system.

TAL, whilst an interesting idea seems to low level fixed on a single CPU instruction architecture. So the idea is a language like 'C' with a complex type system like TAL, that can implement a pure functional DSL which would be enforced by the type system. You could have a pure strict functional language, but where a simple monadic program would compile directly to 'C' code.

There should also be some proofs, for example the garbage collector implementation should result in the correct simple types for objects managed by the GC. The complex linear/effect types expressing the manual management of memory will only disappear from the type signatures if the GC code is correct. As discussed in another topic, the GC would require the language to have the capability of reflecting memory layout, stack context and concurrency control.

I am interested in any work / papers that might present a similar language concept, or develop a suitable type-system or implementation to this.

Eve development diary

We put up a development diary for our work on the Eve language. The first post is a rough overview of the last nine months of development as reconstructed from github and slack, starting with the original demo at Strange Loop. In the future we will run more detailed posts once per month.

The original motivation came from reading an old comment on LtU bemoaning the fact that there is very little record of the development of todays languages and the thoughts processes behind them. Hopefully this will be useful to future readers.

Tools for layered languages?

I like things like Shen and Haxe and ATS, where I can take a high level language and target another, but more common language. The intent is to be able to e.g. play with: how portable your source code is vs. how much you use 'native' libraries. So Shen -> {Python,Javascript,...} or Haxe -> {C++,Javascript,...} or ATS -> {C,PHP,...}.

But of course a problem is that it is unlikely to come with real tooling like a source line debugger, or profilers, or etc.

Apparently there is some leverage to be had if you are targeting C, but it seems more like dumb luck than based on principle, to me. :-)

Does anybody know of efforts to "solve" this "problem"?

automatic test discovery without reflection?

A programming language which is intended for production use must support the ability to easily create unit tests. Unit testing frameworks typically support a feature called test discovery, whereby all tests that are linked within an executable can be automatically detected and run, without the need for the programmer to manually register them.

For many languages such as Java and Python, reflection is used for this purpose. For languages that don't support reflection (e.g. C++) other, more cumbersome, methods are employed such as registration macros.

One problem with using reflection is that for AOT (Compiled ahead-of-time, as opposed to JIT) languages, reflection is expensive in terms of code size. Even in compressed form, the metadata for describing a method or class will often be larger than the thing being described. This can be a significant burden on smaller platforms such as embedded systems and game consoles. (Part of the reason for the bulk is due to the need to encode the metadata in a way that is linker-compatible, so that a type defined in one module can reference a type defined in a different module.)

Most of the things that one would use reflection for can be accomplished via other means, depending on the language. For example, reflection is often used to automatically construct implementations of an interface, such as for creating testing mocks or RPC stubs, but these can also be done via metaprogramming. However, metaprogramming can't solve the problem of test discovery, because the scope of metaprogramming is limited to what the compiler knows at any given moment, whereas the set of tests to be run may be a collection of many independently-compiled test modules.

A different technique is annotation processing, where the source code is annotated with some tags that are then consumed by some post-processor which generates additional source code to be included in the output module. In this case, the processor would generate the test registration code, which would be run during static initialization. The main drawback here is complexity, because the annotation processor isn't really a feature of the language so much as it is a feature of the compilation environment - in other words, you can't specify the behavior you want in the language itself.

I would be curious to know if there are other inventive solutions to this class of problems.

XML feed