Half-baked idea: mini-language for vector processing

I've been looking at K. It's a truly marvelous tool for thinking about vectors, with the added bonus that your thoughts are automatically executable. Lots of interesting things can be done with (short) one-liners in K: "standard deviation", "are those two arrays permutations of each other", "transform seconds to hours/min/sec" etc.

And it looks like line noise. Smells just like regular expressions. Aha!

An idea naturally occurred to me: make a C library (for example) that takes C strings as K-style "array processing expressions", and C arrays as arguments. Just like regexes. I even thought of a cool name, "Knot", i.e. "Not K" or "K Notation" :-)

Unfortunately, I'm no Arthur Whitney, and don't have the skills to pull this off. So I decided to set the idea afloat here on LtU, hoping that something might become of it.

Is this feasible? Is this useful? Has this been done?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Fine idea

I know that this has been done commercially and is in use by some companies (full disclosure: I wrote it and the company I work for licensed it out to others). The initial version was based on interpretation, and later ones used some just-in-time compilation. Going beyond that, one project which I'm thinking of doing over, was to take a notation in the same spirit and produce SSE or Altivec optimized code. The idea being that you write your kernel algorithm in that notation and produce optimized code which is callable from C or whatever. No reason that other backends couldn't be supported, but those were the two we had interest in.

I've been browsing the archives here and found the discussion on these notations and potential languages for games programming to be interesting. I'm looking around for a new project at the moment and its great food for thought.

Nice to hear that!

It's a small world, I stumbled upon your website just yesterday and read through most of it :-)

So there's demand for this sort of thing. Good! I've almost convinced myself to start the ball rolling, maybe by implementing + / []. Can't be too hard, because I don't really want speed, just the embeddable notation - and the notation itself is already laid out in the K spec.

Give me a couple days to think about it. Honestly, I'm afraid.

vlerq

You might find vlerq, which I've mentioned in passing before, interesting: "vlerq"' stands for "Vectors, Language, Embeddable, Relational, Query." Most of the wrapping work has been done in TCL, but the core of it is a single C file plus some support for vectorized instruction sets—in other words, it sounds like exactly what you're describing.

almost :-/

I've actually been aware of vlerq for a long time. It seems to lack... focus? It's more of a "look what I've made" than "look what you can do with this"...

Also, the most valuable thing for me in K is powerful/accessible notation. Apparently it isn't a major concern for vlerq.

K syntax

Adding a K syntax to Vlerq would be fascinating. I don't grasp K (nor J) well enough to make it happen. I can definitely help implement specific primitives underneath (in C or Thrill) to make all operators available.

It isn't hard

The most useful parts are also the simplest.

sum: x+y (if one is a scalar, it's added to all elements of the other)
sum of vector: +/x
number of elements: #x
arithmetic mean: +/x%#x
a sorting permutation: <x
sort: x[<x]
two arrays are permutations of each other: x[<x]~y[<y]
compare two arrays, yielding an array of 0s and 1s: x<y
enumerate indexes of non-zero elements: &x
drop all elements greater or equal to a threshold: x[&x<y]

And so on. It's really that simple and useful.

vlerq - simd?

Are you sure vlerq actually uses vector instructions? Browsing/grepping around the code didn't show any examples - there is file that implements some vector operations, but these are defined using C loops over arrays rather than simd instructions.

no simd yet

Correct, Vlerq does not use vector instructions right now. The data structures are vector oriented, which is a prerequisite. The choice of op-codes is open-ended (i.e. anyone could add them, even as dynamically loaded extension). I'm focusing on data structures and representations at this stage. And XML mappings, and persistence, and relational algebra, and language bindings, and memory usage.

My Bad

I'd thought the vectorization support was already in there. My Mistake!

Also regarding vlerq: the build process assumes that "." and tclkit are on your PATH. For Mac OS X users, the TclTkAqua installer installs everything you need, but doesn't add /Library/Tcl/bin to the path. Although it's documented in the TclTkAqua distribution, you may wish to mention it in your docs, since some of us hadn't read the TclTkAqua docs in a long time. :-) Also, something goes wrong if tclkit isn't on your PATH in such a way that correcting that and doing "make" doesn't work, even if you do a "make clean" first: you have to wipe out the vlerq directory and start over. I don't yet understand why that is.

In any case, thanks as always for your hard work on great tools like MetaKit, tclkit, and vlerq!

Thanks

Made some changes, as reported here.

there's always A+

Arthur Whitney's previous project was the A+ dialect of APL. The C code is intimidating, but there's probably a way you can pass it strings to evaluate.