kdb+ is a real-time time series database, known in the financial services universe as the fastest tick database on the market. It was first conceived by Arthur Whitney at Morgan Stanley as a prototype, and over the last 35+ years has grown to add many features. The database makes such aggressive usage of mmap() POSIX function for mapping file chunks into main memory, to the point where it has exposed issues with the implementation of mmap itself.
Recently, the company now behind kdb+ has also built Kx for DAAS (Data-as-a-Service), which is basically a cloud-based, massively clustered version of kdb+ that deals with the curious oddity that kdb+ is effectively entirely singly threaded. For those interested in reading more about kdb+'s unique cloud architecture (as compared to "big data" solutions like Hadoop), you can read the following whitepapers as suggestive guidelines for how the q community thinks about truly "big data" several orders of magnitude faster and larger than most Hadoop data sets:
While I don't suggest these papers are the blueprint for copying/mimicking the DAAS product, it does help the LtU reader imagine a "different world" of data processing than the often cited Map/Reduce paper and other more mainstream approaches. What is particularly striking is how tiny q.exe (the program that runs kdb+ and provides a CLI for q scripting) is. Language researchers are looking at provably correct C compilers, and it is not a huge leap to think about the world soon seeing provably correct real-time time series databases using kdb+ as an inspiration.
Another curiosity, relevant to us here at LtU, is that kdb+ has its own programming language, q. q is a variant of APL with a special library for statistics. Most "big data" solutions don't have native implementations for weighted average, which is a fairly important and frequently used function in quantitative finance, useful for computing volume weighted average price (VWAP) as well as tilt and weighted spread. q is itself implemented in another language, k. The whole language of each is just a couple lines of (terse) code.
In this year's POPL, Bob Atkey made a splash by showing how to get from parametricity to conservation laws, via Noether's theorem:
Graydon Hoare has an excellent series of (two) blog posts about programming languages for interactive scientific computing.
The scenario of these posts is to explain and constrast the difference between two scientific computing languages, Python and "SciPy/SymPy/NumPy, IPython, and Sage" on one side, and Julia on the other, as the result of two different design traditions, one (Python) following Ousterhout's Dichotomy of having a convenient scripting language on top of a fast system language, and the other rejecting it (in the tradition of Lisp/Dylan and ML), promoting a single general-purpose language.
I don't necessarily buy the whole argument, but the posts are a good read, and have some rather insightful comments about programming language use and design.
Quotes from the first post:
Quotes from the second:
This interesting blog post argues that in recent years Python has gained libraries making it the choice language for scientific computing (over MATLAB and R primarily).
I find the details discussed in the post interesting. Two small points that occur to me are that in several domains Mathematica is still the tool of choice. From what I could see nothing free, let alone open source, is even in the same ballpark in these cases. Second, I find it interesting that several of the people commenting mentioned IPython. It seems to be gaining ground as the primary environment many people use.
Active forum topics
New forum topics