What is the best system for experimenting with and visualizing complex algorithms?

Let's say you see a sophisticated algorithm in a textbook, with a precise definition in terms of high-level operations. For example, in his textbook on Machine Learning, AlpaydÄ±n derives an algorithm for finding the highest probability state sequence for a Hidden Markov Model given an observation sequence. The algorithm has a precise definition but it is very high level and it leans heavily on the accompanying mathematical derivation. All of the implementation details are left out. I would like to code the algorithm, combine it with other algorithms, and make variations. I would like to test the algorithm and visualize its results. More broadly, I would like to experiment with machine learning and knowledge-intensive algorithms. My goal is to understand the possibilities and limitations of machine learning. (But other people might have other goals with such a system.)

What's a good system for doing this? Ideally, the system should have a well-designed core language (including lexical scoping, good concurrency model, closures, factored design, etc.), have a good set of combinators for building algorithms and good libraries for all the most useful high level operations, be interactive and easy to use, have a good I/O ability including reading and writing concurrent streams and interactive visualization, and be reasonably efficient. This is a tall order. Should I look at a computer algebra system (e.g., Mathematica, Maple, or Sage) to satisfy these requirements? What experience do LtU members have?

Comment viewing options

This is a tall

This is a tall order.

Probably why nobody has replied yet.

I would be interested in discussing this further, but I don't have proven experience to relate to you. In undergrad I had to build a system out of the Statistical Parametic Mapping MATLAB plug-in, AND an old visualization DSL called IDL, AND convert 25,000 lines of spaghetti C code into something much smaller and more parametric AND provide a way to pull from a laboratory's data feed image files from a machine's custom Siemens' database. I didn't get to do nearly as much experimenting with mathematical models I wanted because there wasn't a good way for me to rapidly prototype my ideas. It was annoying. And dipping into MATLAB to invoke function calls so that MATLAB would do heavy lifting for pre-written SPM routines was simply gross. It felt broken and hard to audit the quality of the implementation. I just made sure I ended up implementing something I knew was right, tested the independent components, and hoped I didn't have any major integration bugs.

I would like to code the algorithm, combine it with other algorithms, and make variations.

The research community investigating factor graphs has come up with some interesting results, such as the product-sum algorithm. If you're not familiar with factor graphs, they are what MS Research has used for TrueSkill and also for Bing Ads. I believe they have used F# to code this, but they also have to do a lot of ad-hoc code splitting in order to slice and dice the data so they can factor-graph-it; they ultimately resort to SQL Server as a slave for slicing and dicing large data sets. With WPF, I would presume charting such results would be braindead easy. It is pretty easy to roll your own charting software in WPF - I just started working on a personal project to do so.

I am not sure what you have in mind in terms of inventing and combining algorithms, but the product-sum algorithm has been shown to be a good primitive for studying many forms of problems.

Computer Algebra Systems

Should I look at a computer algebra system (e.g., Mathematica, Maple, or Sage) to satisfy these requirements?

I used Maple in undergrad when I was a TA for a calculus lab for three years. I observed quite a few bugs in this software, and would never use it for anything. I think the only reason we used it was cheap academic licensing as compared with Mathematica. All my friends who went to more engineering-focused colleges had a similar calculus lab, but with Mathematica.

To give you an idea of the bugs in Maple, I believe it was Maple 8 where they introduced a bug that would just randomly trash your Maple documents depending on stuff like whitespace and semicolons, etc. Supposedly a "patch" fixed the problem but I still found problems.

I've used Mathematica before, but not enough to really know much about it. If you are crunching huge data sets and have never used Mathematica before, but are familiar with Ocaml, I would suspect a language like F# with WPF for visualization would be much faster (to learn, debug, tune, etc.). Most academics don't have time to learn heavily specialized tools unless they'll be using them for eternity. Plus F# and WPF don't cost anything, so you won't siphon \$10,000 from your grant money every time you want to ferret out some task to an undergrad to test some idea you have. -- but maybe that is chump change to you. I know the PIs I worked for hated Mathematica licensing fees and would have rather spent the money on compute farms. It is viral nature of Mathematica that they seemed to hate: they had no way to keep track of how many licenses they had, who was actually currently using it, how to handle a situation when their budget was empty but two people both needed to use the same time-shared machine (one a funded grad student but not part of the core lab, the other a funded undergrad student but not part of the long-term lab's future), etc. Politically messy.

A Dataflow Language for Scriptable Debugging

A Dataflow Language for Scriptable Debugging

This paper shows a language based on FRP that allows you to listen in on operations performed in an algorithm. You can define interactive visualizations of the operations and data structures in the algorithm. This is not specifically geared towards machine learning, but it seems like an interesting approach to algorithm visualization in general. For machine learning you would probably want to add procedures for plotting several kinds of graphs.

Python+numpy+scipy+matplotlib is a combination that doesn't have as good support for interactivity, but for machine learning these are excellent tools. You get linear algebra and optimization routines as well as many options for plotting the results. This is the better choice if you want to learn about machine learning simply because it comes with so many things out of the box, but I think it has less potential than the approach in A Dataflow Language for Scriptable Debugging.

Python Reinteract

Python+numpy+scipy+matplotlib is a combination that doesn't have as good support for interactivity
You might like Reinteract:
Reinteract is a system for interactive experimentation with python. You enter Python code and expressions and immediately see the results. What distinguishes Reinteract from a shell (such as IPython or the builtin interactive mode) is that you can go back and edit expressions you entered earlier and the results will flow through the part of the worksheet after the changed portion.

Thanks. I was planning to

Thanks. I was planning to implement something like that! Excellent work.

It doesn't fit everything,

But I wondered about the same thing some weeks ago (for teaching an introduction to programming) and I ended up that Javascript + HTML5 isn't that bad at it. I tested the idea and ended up coding a fractal tree example which in the end shows recursion, some DOM-coding, some elementary graphics in just about twenty lines. And the good thing, no install necessary except for a HTML5 (canvas) enabled browser.

[For the hell of it, code is below. If Drupal would support HTML5 snippets, it would actually run inside your browser now.]

function draw_branch(xpos0, ypos0, xpos1, ypos1) {
var canvas = document.getElementById("ftree");
var ctx = canvas.getContext("2d");

var originx = 225;
var originy = 250;

ctx.moveTo(originx-xpos0, originy-ypos0);
ctx.lineTo(originx-xpos1, originy-ypos1);

ctx.strokeStyle = "#0e0";
ctx.stroke();
};

function draw_tree (n, xpos0, ypos0, alpha, length) {
if (n > 9) return;

// else

var xpos1 = xpos0 + length * Math.sin(alpha * Math.PI/180);
var ypos1 = ypos0 + length * Math.cos(alpha * Math.PI/180);

draw_branch(xpos0, ypos0, xpos1, ypos1);
draw_tree(n+1, xpos1, ypos1, alpha-25, length*0.8);
draw_tree(n+1, xpos1, ypos1, alpha+40, length*0.6);
};

draw_tree(0, 0.0, 0.0, 0.0, 60.0);


Three basic concepts stand

Three basic concepts stand out to me: tangible functional programming, programming-by-demonstration, and aspect-oriented FRP (MzTake, as Jules pointed out). There might be a really exciting mix between the first and the last!

And... at the end of the day, I'll still use processing or js+canvas ;-)

Weka

Have you looked at Weka ?

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.

The Weka Knowledge Explorer lets you visualize and explore the algorithms. Couple of the original authors also have a pretty popular book that uses Weka for exploring data mining algorithms.