kdb+ 3.5 released last month

kdb+ is a real-time time series database, known in the financial services universe as the fastest tick database on the market. It was first conceived by Arthur Whitney at Morgan Stanley as a prototype, and over the last 35+ years has grown to add many features. The database makes such aggressive usage of mmap() POSIX function for mapping file chunks into main memory, to the point where it has exposed issues with the implementation of mmap itself.

Recently, the company now behind kdb+ has also built Kx for DAAS (Data-as-a-Service), which is basically a cloud-based, massively clustered version of kdb+ that deals with the curious oddity that kdb+ is effectively entirely singly threaded. For those interested in reading more about kdb+'s unique cloud architecture (as compared to "big data" solutions like Hadoop), you can read the following whitepapers as suggestive guidelines for how the q community thinks about truly "big data" several orders of magnitude faster and larger than most Hadoop data sets:

While I don't suggest these papers are the blueprint for copying/mimicking the DAAS product, it does help the LtU reader imagine a "different world" of data processing than the often cited Map/Reduce paper and other more mainstream approaches. What is particularly striking is how tiny q.exe (the program that runs kdb+ and provides a CLI for q scripting) is. Language researchers are looking at provably correct C compilers, and it is not a huge leap to think about the world soon seeing provably correct real-time time series databases using kdb+ as an inspiration.

Another curiosity, relevant to us here at LtU, is that kdb+ has its own programming language, q. q is a variant of APL with a special library for statistics. Most "big data" solutions don't have native implementations for weighted average, which is a fairly important and frequently used function in quantitative finance, useful for computing volume weighted average price (VWAP) as well as tilt and weighted spread. q is itself implemented in another language, k. The whole language of each is just a couple lines of (terse) code.

PowerShell is open sourced and is available on Linux

Long HN thread ensues. Many of the comments discuss the benefits/costs of basing pipes on typed objects rather than text streams. As someone who should be inclined in favor of the typed object approach I have to say that I think the text-only folks have the upper hand at the moment. Primary reason is that text as a lingua franca between programs ensures interoperability (and insurance against future changes to underlying object models) and self-documenting code. Clearly the Achilles' heel is parsing/unparsing.

As happens often, one is reminded of the discussions of DSLs and pipelines in Jon Bentley's Programming Pearls...

BWK on "How to succeed in language design without really trying"

A talk by Brian Kernighan at The University of Nottingham.

Describes all the usual suspects: AWK, EQN, PIC. Even AMPL. I was wondering which languages he had in mind when he mentioned that some of his creations were total flops. I'd love to learn from those!

The talk is a fun way to spend an hour, and the audio would be good for commuters. For real aficionados I would recommend reading Jon Bentley's articles on the design of these languages (as well as CHEM and others) instead.

Portable Efficient Assembly Code-generation in High-level Python

PeachPy is a Python framework for writing high-performance assembly kernels.

PeachPy aims to simplify writing optimized assembly kernels while preserving all optimization opportunities of traditional assembly.

You can use the same code to generate assembly for Windows, Unix, and Golang assembly. The library handles the various ABIs automatically. I haven't seen this cool project before.

Among the cool features is the ability to invoke the generated assembly as regular Python functions. Nice.

mbeddr: an Extensible C-based Programming Language and IDE for Embedded Systems

Markus Voelter, Bernd Kolb1, Daniel Ratiu, and Bernhard Schaetz, "mbeddr: an Extensible C-based Programming Language and IDE for Embedded Systems", SplashCON/Wavefront 2012.

Although embedded systems are an increasingly large part of our lives, and despite the fact that embedded software would undoubtedly benefit from the kind safety guarantees provided by more advanced type systems, most embedded software development is still done in C. That's partly a result of toolchain availability, and partly because many more advanced languages typically impose requirements on memory, dynamic memory allocation, and other runtime infrastructure that simply aren't supportable on a lot of resource-constrained microcontrollers or acceptable in a risk-averse environment. Mbeddr seems to be seeking a middle ground between C, and creating a whole new language. From the paper:

While the C programming language provides very good support for writing efficient, low-level code, it does not offer adequate means for defining higher-level abstractions relevant to embedded software. In this paper we present the mbeddr technology stack that supports extension of C with constructs adequate for embedded systems. In mbeddr, efficient low-level programs can be written using the well-known concepts from C. Higher level domain-specific abstractions can be seamlessly integrated into C by means of modular language extension regarding syntax, type system, semantics and IDE. In the paper we show how language extension can address the challenges of embedded software development and report on our experience in building these extensions. We show that language workbenches deliver on the promise of significantly reducing the effort of language engineering and the construction of corresponding IDEs. mbeddr is built on top of the JetBrains MPS language workbench. Both MPS and mbeddr are open source software

It appears that mbeddr allows multiple DSLs to be built on top of C to provide greater safety and more domain-specific expressions of typical embedded software patterns. Additionally, it provides integration with various analysis tools including model-checkers. Another paper, "Preliminary Experience of using mbeddr for Developing Embedded Software", provides a look at how all of these things fit together in use.

The mbeddr approach seems similar in concept to Ivory and Tower, although mbeddr uses JetBrains MPS as the platform for creating DSLs instead of building an embedded DSL in Haskell.

Everything old is new again: Quoted Domain Specific Languages

Everything old is new again: Quoted Domain Specific Languages, by Shayan Najd, Sam Lindley, Josef Svenningsson, Philip Wadler:

We describe a new approach to domain specific languages (DSLs), called Quoted DSLs (QDSLs), that resurrects two old ideas: quotation,
from McCarthy’s Lisp of 1960, and the subformula property, from Gentzen’s natural deduction of 1935. Quoted terms allow the DSL to share the syntax and type system of the host language. Normalising quoted terms ensures the subformula property, which guarantees that one can use higher-order types in the source while guaranteeing first-order types in the target, and enables using types to guide fusion. We test our ideas by re-implementing Feldspar, which was originally implemented as an Embedded DSL (EDSL), as a QDSL; and we compare the QDSL and EDSL variants.

Neat paper first posted here by Blaisorblade that formalizes the properties needed for using quotation to implement DSLs. Most embedded DSLs use custom operators and lifted conditionals to make embedded programs seem more natural, but quoting enables the use of native operators and conditions, and might lead to more natural expressions of lightweight (possibly heterogenous) staged computations.

Safely Composable Type-Specific Languages

Cyrus Omar, Darya Kurilova, Ligia Nistor, Benjamin Chung, Alex Potanin, and Jonathan Aldrich, "Safely Composable Type-Specific Languages", ECOOP14.

Programming languages often include specialized syntax for common datatypes (e.g. lists) and some also build in support for specific specialized datatypes (e.g. regular expressions), but user-defined types must use general-purpose syntax. Frustration with this causes developers to use strings, rather than structured data, with alarming frequency, leading to correctness, performance, security, and usability issues. Allowing library providers to modularly extend a language with new syntax could help address these issues. Unfortunately, prior mechanisms either limit expressiveness or are not safely composable: individually unambiguous extensions can still cause ambiguities when used together. We introduce type-specific languages (TSLs): logic associated with a type that determines how the bodies of generic literals, able to contain arbitrary syntax, are parsed and elaborated, hygienically. The TSL for a type is invoked only when a literal appears where a term of that type is expected, guaranteeing non-interference. We give evidence supporting the applicability of this approach and formally specify it with a bidirectionally typed elaboration semantics for the Wyvern programming language.

A Next Generation Smart Contract and Decentralized Application Platform

A Next Generation Smart Contract and Decentralized Application Platform, Vitalik Buterin.

When Satoshi Nakamoto first set the Bitcoin blockchain into motion in January 2009, he was simultaneously introducing two radical and untested concepts. The first is the "bitcoin", a decentralized peer-to-peer online currency that maintains a value without any backing, intrinsic value or central issuer. So far, the "bitcoin" as a currency unit has taken up the bulk of the public attention, both in terms of the political aspects of a currency without a central bank and its extreme upward and downward volatility in price. However, there is also another, equally important, part to Satoshi's grand experiment: the concept of a proof of work-based blockchain to allow for public agreement on the order of transactions. Bitcoin as an application can be described as a first-to-file system: if one entity has 50 BTC, and simultaneously sends the same 50 BTC to A and to B, only the transaction that gets confirmed first will process. There is no intrinsic way of determining from two transactions which came earlier, and for decades this stymied the development of decentralized digital currency. Satoshi's blockchain was the first credible decentralized solution. And now, attention is rapidly starting to shift toward this second part of Bitcoin's technology, and how the blockchain concept can be used for more than just money.

Commonly cited applications include using on-blockchain digital assets to represent custom currencies and financial instruments ("colored coins"), the ownership of an underlying physical device ("smart property"), non-fungible assets such as domain names ("Namecoin") as well as more advanced applications such as decentralized exchange, financial derivatives, peer-to-peer gambling and on-blockchain identity and reputation systems. Another important area of inquiry is "smart contracts" - systems which automatically move digital assets according to arbitrary pre-specified rules. For example, one might have a treasury contract of the form "A can withdraw up to X currency units per day, B can withdraw up to Y per day, A and B together can withdraw anything, and A can shut off B's ability to withdraw". The logical extension of this is decentralized autonomous organizations (DAOs) - long-term smart contracts that contain the assets and encode the bylaws of an entire organization. What Ethereum intends to provide is a blockchain with a built-in fully fledged Turing-complete programming language that can be used to create "contracts" that can be used to encode arbitrary state transition functions, allowing users to create any of the systems described above, as well as many others that we have not yet imagined, simply by writing up the logic in a few lines of code.

Includes code samples.


It is increasingly important for applications to protect user privacy. Unfortunately, it is often non-trivial for programmers to enforce privacy policies. We have developed Jeeves to make it easier for programmers to enforce information flow policies: policies that describe who can see what information flows through a program. Jeeves allows the programmer to write policy-agnostic programs, separately implementing policies on sensitive values from other functionality. Just like Wooster's clever valet Jeeves in Wodehouse's stories, the Jeeves runtime does the hard work, automatically enforcing the policies to show the appropriate output to each viewer.

From what I gather, Jeeves takes Aspect Oriented approach to privacy. This is of course not a new idea. I presume that many of the classic problems with AOP would apply to Jeeves. Likewise, using information flow analysis for handling privacy policies is not an new idea. Combining the two, however, seems like a smart move. Putting the enforcement at the run-time level makes this sound more practical than other ideas I have heard before. Still, I personally think that specifying privacy policies at the end-user level and clarifying the concept of privacy at the normative, legal and conceptual levels are more pressing concerns. Indeed, come to think of it: I don't really recall a privacy breach that was caused by a simple information flow bug. Privacy expectations are broken on purpose by many companies and major data breaches occur when big databases are shared (recall the Netflix Prize thing). Given this, I assume the major use-case is for Apps, maybe even as a technology that someone like Apple could use to enforce the compliance of third-party Apps to their privacy policies.

I haven't looked too closely, so comments from more informed people are welcome.

Jeeves is implemented as an embedded DSL in Scala and Python.


I was surprised to see that DYNAMO hasn't been mentioned here in the past. DYNAMO (DYNAmic MOdels) was the simulation language used to code the simulations that led to the famous 1972 book The Limits to Growth from The Club of Rome. The language was designed in the late 1950s. It is clear that the language was used in several other places and evolved through several iterations, though I am not sure how extensively it was used. When Stafford Beer was creating Cybersyn for Salvador Allende he used DYNAMO to save time suggesting it was somewhat of a standard tool (this is described in Andrew Pickering's important book The Cybernetic Brain).

The language itself is essentially what you'd expect. It is declarative, programs consisting of a set of equations. The equations are zero and first-order difference equations of two kinds: level equations (accumulations) and rate equations (flows). Computation is integration over time. Levels can depend on rates and vice versa with the language automatically handling dependencies and circularities. Code looks like code looked those days: fixed columns, all caps, eight characters identifiers.

Here are a few links:

  • Section 3.7 of this history of discrete event simulation languages is a succinct description of the history of the language and its main features.
  • A more leisurely description of the language and the Limits to Growth model can be found in this article. Ironically, the author of the article reimplemented the model in Javascript (run it!). What was originally written in a DSL is now implemented in a general purpose language, with all the niceties handled manually.
  • Finally, a nice piece on Jay Forrester who prompted the creation of SIMPLE and DYNAMO, its offspring.
XML feed