Software Engineering

Automating Ad hoc Data Representation Transformations

Automating Ad hoc Data Representation Transformations by Vlad Ureche, Aggelos Biboudis, Yannis Smaragdakis, and Martin Odersky:

To maximize run-time performance, programmers often specialize their code by hand, replacing library collections and containers by custom objects in which data is restructured for efficient access. However, changing the data representation is a tedious and error-prone process that makes it hard to test, maintain and evolve the source code.

We present an automated and composable mechanism that allows programmers to safely change the data representation in delimited scopes containing anything from expressions to entire class definitions. To achieve this, programmers define a transformation and our mechanism automatically and transparently applies it during compilation, eliminating the need to manually change the source code.

Our technique leverages the type system in order to offer correctness guarantees on the transformation and its interaction with object-oriented language features, such as dynamic dispatch, inheritance and generics.

We have embedded this technique in a Scala compiler plugin and used it in four very different transformations, ranging from improving the data layout and encoding, to
retrofitting specialization and value class status, and all the way to collection deforestation. On our benchmarks, the technique obtained speedups between 1.8x and 24.5x.

This is a realization of an idea that has been briefly discussed here on LtU a few times, whereby a program is written using high-level representations, and the user has the option to provide a lowering to a more efficient representation after the fact.

This contrasts with the typical approach of providing efficient primitives, like primitive unboxed values, and leaving it to the programmer to compose them efficiently up front.

Simon Peyton Jones elected into the Royal Society Fellowship

Simon Peyton Jones has been elected as a Fellow of the Royal Society. The Royal Society biography reads:

Simon's main research interest is in functional programming languages, their implementation, and their application. He was a key contributor to the design of the now-standard functional language Haskell, and is the lead designer of the widely-used Glasgow Haskell Compiler (GHC). He has written two textbooks about the implementation of functional languages.

More generally, Simon is interested in language design, rich type systems, compiler technology, code generation, runtime systems, virtual machines, and garbage collection. He is particularly motivated by direct use of principled theory to practical language design and implementation -- that is one reason he loves functional programming so much.

Simon is also chair of Computing at School, the grass-roots organisation that was at the epicentre of the 2014 reform of the English computing curriculum.

Congratulations SPJ!

mbeddr: an Extensible C-based Programming Language and IDE for Embedded Systems

Markus Voelter, Bernd Kolb1, Daniel Ratiu, and Bernhard Schaetz, "mbeddr: an Extensible C-based Programming Language and IDE for Embedded Systems", SplashCON/Wavefront 2012.

Although embedded systems are an increasingly large part of our lives, and despite the fact that embedded software would undoubtedly benefit from the kind safety guarantees provided by more advanced type systems, most embedded software development is still done in C. That's partly a result of toolchain availability, and partly because many more advanced languages typically impose requirements on memory, dynamic memory allocation, and other runtime infrastructure that simply aren't supportable on a lot of resource-constrained microcontrollers or acceptable in a risk-averse environment. Mbeddr seems to be seeking a middle ground between C, and creating a whole new language. From the paper:

While the C programming language provides very good support for writing efficient, low-level code, it does not offer adequate means for defining higher-level abstractions relevant to embedded software. In this paper we present the mbeddr technology stack that supports extension of C with constructs adequate for embedded systems. In mbeddr, efficient low-level programs can be written using the well-known concepts from C. Higher level domain-specific abstractions can be seamlessly integrated into C by means of modular language extension regarding syntax, type system, semantics and IDE. In the paper we show how language extension can address the challenges of embedded software development and report on our experience in building these extensions. We show that language workbenches deliver on the promise of significantly reducing the effort of language engineering and the construction of corresponding IDEs. mbeddr is built on top of the JetBrains MPS language workbench. Both MPS and mbeddr are open source software

It appears that mbeddr allows multiple DSLs to be built on top of C to provide greater safety and more domain-specific expressions of typical embedded software patterns. Additionally, it provides integration with various analysis tools including model-checkers. Another paper, "Preliminary Experience of using mbeddr for Developing Embedded Software", provides a look at how all of these things fit together in use.

The mbeddr approach seems similar in concept to Ivory and Tower, although mbeddr uses JetBrains MPS as the platform for creating DSLs instead of building an embedded DSL in Haskell.

Don Syme receives a medal for F#

Don Syme receives the Royal Academy of Engineering's Silver Medal for his work on F#. The citation reads:

F# is known for being a clear and more concise language that interoperates well with other systems, and is used in applications as diverse asanalysing the UK energy market to tackling money laundering. It allows programmers to write code with fewer bugs than other languages, so users can get their programme delivered to market both rapidly and accurately. Used by major enterprises in the UK and worldwide, F# is both cross-platform and open source, and includes innovative features such as unit-of-measure inference, asynchronous programming and type providers, which have in turn influenced later editions of C# and other industry languages.


Safely Composable Type-Specific Languages

Cyrus Omar, Darya Kurilova, Ligia Nistor, Benjamin Chung, Alex Potanin, and Jonathan Aldrich, "Safely Composable Type-Specific Languages", ECOOP14.

Programming languages often include specialized syntax for common datatypes (e.g. lists) and some also build in support for specific specialized datatypes (e.g. regular expressions), but user-defined types must use general-purpose syntax. Frustration with this causes developers to use strings, rather than structured data, with alarming frequency, leading to correctness, performance, security, and usability issues. Allowing library providers to modularly extend a language with new syntax could help address these issues. Unfortunately, prior mechanisms either limit expressiveness or are not safely composable: individually unambiguous extensions can still cause ambiguities when used together. We introduce type-specific languages (TSLs): logic associated with a type that determines how the bodies of generic literals, able to contain arbitrary syntax, are parsed and elaborated, hygienically. The TSL for a type is invoked only when a literal appears where a term of that type is expected, guaranteeing non-interference. We give evidence supporting the applicability of this approach and formally specify it with a bidirectionally typed elaboration semantics for the Wyvern programming language.


It is increasingly important for applications to protect user privacy. Unfortunately, it is often non-trivial for programmers to enforce privacy policies. We have developed Jeeves to make it easier for programmers to enforce information flow policies: policies that describe who can see what information flows through a program. Jeeves allows the programmer to write policy-agnostic programs, separately implementing policies on sensitive values from other functionality. Just like Wooster's clever valet Jeeves in Wodehouse's stories, the Jeeves runtime does the hard work, automatically enforcing the policies to show the appropriate output to each viewer.

From what I gather, Jeeves takes Aspect Oriented approach to privacy. This is of course not a new idea. I presume that many of the classic problems with AOP would apply to Jeeves. Likewise, using information flow analysis for handling privacy policies is not an new idea. Combining the two, however, seems like a smart move. Putting the enforcement at the run-time level makes this sound more practical than other ideas I have heard before. Still, I personally think that specifying privacy policies at the end-user level and clarifying the concept of privacy at the normative, legal and conceptual levels are more pressing concerns. Indeed, come to think of it: I don't really recall a privacy breach that was caused by a simple information flow bug. Privacy expectations are broken on purpose by many companies and major data breaches occur when big databases are shared (recall the Netflix Prize thing). Given this, I assume the major use-case is for Apps, maybe even as a technology that someone like Apple could use to enforce the compliance of third-party Apps to their privacy policies.

I haven't looked too closely, so comments from more informed people are welcome.

Jeeves is implemented as an embedded DSL in Scala and Python.

Simple Generators v. Lazy Evaluation

Oleg Kiselyov, Simon Peyton-Jones and Amr Sabry: Simple Generators:

Incremental stream processing, pervasive in practice, makes the best case for lazy evaluation. Lazy evaluation promotes modularity, letting us glue together separately developed stream producers, consumers and transformers. Lazy list processing has become a cardinal feature of Haskell. It also brings the worst in lazy evaluation: its incompatibility with effects and unpredictable and often extraordinary use of memory. Much of the Haskell programming lore are the ways to get around lazy evaluation.

We propose a programming style for incremental stream processing based on typed simple generators. It promotes modularity and decoupling of producers and consumers just like lazy evaluation. Simple generators, however, expose the implicit suspension and resumption inherent in lazy evaluation as computational effects, and hence are robust in the presence of other effects. Simple generators let us accurately reason about memory consumption and latency. The remarkable implementation simplicity and efficiency of simple generators strongly motivates investigating and pushing the limits of their expressiveness.

To substantiate our claims we give a new solution to the notorious pretty-printing problem. Like earlier solutions, it is linear, backtracking-free and with bounded latency. It is also modular, structured as a cascade of separately developed stream transducers, which makes it simpler to write, test and to precisely analyze latency, time and space consumption. It is compatible with effects including IO, letting us read the source document from a file, and format it as we read.

This is fascinating work that shows how to gain the benefits of lazy evaluation - decoupling of producers, transformers, and consumers of data, and producing only as much data as needed - in a strict, effectful setting that works well with resources that need to be disposed of once computation is done, e.g. file handles.

The basic idea is that of Common Lisp signal handling: use a hierarchical, dynamically-scoped chain of handler procedures, which get called - on the stack, without unwinding it - to parameterize code. In this case, the producer code (which e.g. reads a file character by character) is the parameterized code: every time data (a character) is produced, it calls the dynamically innermost handler procedure with the data (it yields the data to the handler). This handler is the data consumer (it could e.g. print the received character to the console). Through dynamic scoping, each handler may also have a super-handler, to which it may yield data. In this way, data flows containing multiple transformers can be composed.

I especially like the OCaml version of the code, which is just a page of code, implementing a dynamically-scoped chain of handlers. After that we can already write map and fold in this framework (fold using a loop and a state cell, notably.) There's more sample code.

This also ties in with mainstream yield.

Koka a function oriented language with effect inference

Koka is a function-oriented programming language that seperates pure values from side-effecting computations, where the effect of every function is automatically inferred. Koka has many features that help programmers to easily change their data types and code organization correctly, while having a small language core with a familiar JavaScript like syntax.

Koka extends the idea of using row polymorphism to encode an effect system and the relations between them. Daan Leijen is the primary researcher behind it and his research was featured previously on LtU, mainly on row polymorphism in the Morrow Language.

So far there's no paper available on the language design, just the slides from a Lang.Next talk (which doesn't seem to have video available at Channel 9), but it's in the program for HOPE 2012.

Effective Scala

It's from Twitter, It's open source, It's about Scala. What's not to like?

XML feed