In search for a programming language to replace spreadsheets.

Hello,

I'm in the search for a good programming language to replace much of what I use spreadsheets for today:
It should be constructed to easily define constants / sets of data, and calculating new data from other data, using user-defined formulas.

The advantage of a spreadsheet is that it's extremely easy and fast to work with. But a spreadsheet can soon become complex with:
- lots of long, hard-to-read formulas
- the same formula definitions duplicated all over the document
- data and formulas mixed up all over.

I would like a language that can be used like this:

You define a data field to be a function of multiple arguments like:

Amount Income(Company, Month, Year)

You can then derive other fields like:

Amount Income(Month, Year) = sum(Income(Company, Month, Year))
Amount Income(Year) = sum(Income(Month, Year))

You can also define a lot of data:

Amount Income(Company, Month, Year) = [
  ("Microsoft", "01", "2008") => 1234,
  ("Microsoft", "02", "2008") => 15664,
  ("Sun", "01", "2008") => 1564,
  ("Sun", "02", "2008") => 1652
]

You can now ask for

Income("2008")

and get the appropriate amount: (= sum [for all months] [for all companies] Income(Company, Month, "2008"))

I know many imperative languages, but I've only experimented very little with other programming paradigms. I'm thinking of learning a declarative language like Prolog for this .. but are there better alternatives? I like the idea of Haskell being strongly typed, but I don't know if it will be good for this.

The idea is to be able to model very complex systems, while keeping the calculations concise. Another advantage would be that you are able to reuse formulas. Ideally there should be a nice IDE with the possibility of getting a graphical/tabular view of the calculated data.

Have any of you thought of something like this, or even made or heard of something like this?
I would very much like to know your views on this, and if you know any good languages for this problem domain..

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

QCalc

QCalc is a spreadsheet program with the Q programming language under its hood. It lets you run all Q functions, no strings attached, and you can define your own functions in a script and use these in the spreadsheet, too. If you've ever worked with MS Excel or its OpenOffice and KOffice equivalents, you should feel right at home. The only major difference is that each cell may hold either text, a number or any Q value.

Quantrix Modeler

Quantrix Modeler (quantrix.com) is a "second-generation spreadsheet" in the tradition of Lotus Improv that provides the facilities you ask for but within a spreadsheet metaphor. Rather than dealing in rows and columns, you deal in dimensions with named variables. Above all, the equations are written down separately, not hiding behind individual spreadsheet cells, and they look just like your examples.

It's unfortunately not free, but you might want to look at it anyhow. (Disclaimer: I'm just a satisfied Quantrix customer, no other relation to them.)

Lotus Improv

I used Lotus Improv back in the day (on NeXT boxes, where it originally shipped). I never did understand why the model didn't catch on. Much clearer model than spreadsheets for most uses.

This may be one of those cases where "X is a better Y" is a sellable pitch, while "Z is better than Y" is not. In this case, X=MS Excel, Y=Lotus 1-2-3, and Z = Lotus Improv. These happen if Y is very well liked, but has required significant training.

Lotus Improv

Lotus Improv is much easier for programmers & other people who routinely deal in abstractions. Not so much for casual users, who generate most spreadsheets. Also not so good for people who are exploring datasets.

I'd estimate (based on my personal usage) that at least half of all spreadsheets are ad-hoc databases, thrown together without very much forethought.

But back on topic...

Basically, Lotus Improv/Quantrix are multi-dimensional data-flow languages, aggregation over dimensions as built-in primitives. No other looping, and the only conditionality is at the expression level. Very pretty to work in, and excellent for financial analysis and budgeting use cases.

Sniff, sniff...

I found something on sourceforge that looks promising from a theoretical standpoint. It' written in JavaScript (not Python) and called Objectsheet. It allows tables to represent sets of objects. It utilizes AJAX and reads/writes/queries XML. Being written in JavaScript, it is provided fully self-containted in one HTML page!

Scheme in a Grid?

SIAG probably isn't quite what you are looking for, but it's a spreadsheet based on Scheme. Unfortunately the code isn't the most modern, I don't like the GUI much, and the Scheme implementation it uses isn't the greatest. So I haven't had the patience to really dig in, though I very much like the concept.

The beauty here is that it would reduce your complaints about spreadsheets. (which I would tend to share) Scheme offers greater flexibility and abstraction capabilities.

Don't laugh

OK, don't laugh, but SQL is almost exactly what you are asking for. Seriously. It has a reputation for being uncool, but in its problem domain it is quite good, and you're describing its problem domain.

SQL

Straight out of the box ANSI-ish SQL has no concept of user defined functions like the OP was talking about. The typical built-in stored procedure languages for popular SQL RDBMSs are first-order procedural languages which are meant to be easy to learn by programmers with only scripting or C experience but which are, frankly, otherwise poorly suited to their domain.

The desire for

The desire for bidirectionality implicit in a spreadsheet system suggests full-blown SQL isn't a good match: the updateable view problem hits hard. Datalog (the poster mentioned prolog), with a better syntactic layer and some rethinking for GUI semantics, seems like a solid (and traditional) starting point...

Ordering

The correspondence between spreadsheets and relational databases is interesting. When I think about this it's easy to forget that there's an implicit ordering relationship among spreadsheet rows. How would your sample syntax addresses that? Some of the cruft from sql comes from dealing with ordering of result sets (order by.. ascending/descending etc...) Also, like SQL, seems like the users may want to create esoteric 'datafield' relationships like foreign keys and RI. I do like the concept of a language syntax to abstract over sql tables and rows so I don't have to deal with raw queries or object mappings. A limited subset of sql semantics in a spreadsheet metaphor? I like this idea... How does LINQ fit into this?

Wow! A lot of nice suggestions

I never thought of using a RDBMS (and SQL) for data analysis like this. I've used plenty of SQL for "data persistence" purposes, but it seems like a nice idea to use it just for data analysis. One might need to use views/stored procedures to build in formulas, though.

SQL isn't ideal though - the syntax is not always nice to work with. But I think I'll try to start out with PostgreSQL for this, and see how far I get, and what shortcomings I stumble upon. I'm not much interested in scalability but rather in "ease of use" for this purpose, so if you know a better DBMS for this, I'm happy to hear about it :-). A RDBMS more integrated with a real programming language could also be interesting.

QCalc also looks very interesting, and I'll definitely look into this as well. I like the idea of having both a "script view" and a "sheet view" for your data analysis. The Q language seems to have a very short and concise syntax as well.

SQL is certainly possible to use

I just tried to model what I wanted with MySQL, using only tables, views and a few user-defined functions. To do this, I had to spend some time getting used to working with the relational model for purposes like this.

My tips for using SQL effectively for this, is to:

  • program the complete schema (tables+views) in a .sql-file, including statements to drop and recreate the schema. In that way, you can easily make substantial changes, without being limited by a DB-Manager GUI. It's easy to import a file with SQL-statements to the DB from the command line
  • Also put testdata in a .sql-file.
  • Don't be afraid of doing huge refactorings all the time. This isn't a DB with a public API..
  • Don't be afraid of adding a lot of views. Use views instead of subqueries - they're just as easy to handle in the SQL file anyways! I ended up with about 3 tables and 12 views. It makes the SQL-queries easier to comprehend, and allows more code-reuse.

The result is very usable - I've got all the information I wanted. But I can't help thinking that there "must be a better way" to program stuff like this :-).

My early experiences with the SQL-approach for data modeling is:

Pros:

  • Everybody knows SQL, and a lot of tools exist to make the data from an RDBMS widely available
  • Easier than spreadsheets. And with more complex projects, it's much easier

Cons:

  • The views and tables are tightly coupled, even though it's not that hard to refactor everything in the text editor.
  • I can easily imagine doing tradeoffs between optimal performance and code duplication. Most of the code (= joins and expressions) resides in views, and the views can be constructed differently, dependent on how you want to use the data.
  • When extending the model with a new parameter, you likely have to change a lot of views.

For upcoming projects, I think I'll try out QCalc, and maybe look a bit at the commercial solutions available.

Databases

Not surprisingly SQL was already suggested. I think you're looking for a kind of deductive database. You may be interested in datalog.

FlapJax

You can do something like what you want with FlapJax, which gives you a functional reactive environment in JavaScript. Since it's wrapped up inside of a web page, you can easily do a kind of "literate spreadsheet programming".

You'd do your programming with the FlapJax core, and use its DOM library to actually show the results of the computation.