The Development of Sage

Sage is a project to create a viable free open source alternative to Magma, Maple, Mathematica and Matlab. The lead developer/manager William Stein has recently written Mathematical Software and Me: A Very Personal Recollection, a rather enjoyable story of his experience with mathematical software, especially Magma, and how Sage came to be.

One of the difficulties of writing broadly useful math software is the sheer size and scope of such a project. It is easily outside the abilities of even the most prodigious lone developer. So the focus of Sage, at least up until recently, has been on creating Python-based interfaces to existing mathematical software. For example, for symbolic calculation the Sage distribution includes Maxima (written in Common Lisp), a fork of Macsyma dating back to the early 1980s, and released as open-source software by the US Department of Energy approximately 10 years ago. In addition to Maxima, Sage includes the ability to call out to Magma, Mathematica, and Maple.

There are some interesting PLT-related snippets, for example, Magma's language is frequently criticized, although its algorithms are frequently praised. In conversations with others, OCaml and Haskell were brought up, but William Stein chose Python because he felt that it was more accessible. Also, Axiom, which includes the dependently-typed language Aldor, was rejected in favor of Maxima because Maxima was less esoteric and much more widely used.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Octave

There is also GNU Octave... as another open source alternative.

Different animals

Octave and Sage are different animals. Sage's goal is to be a replacement for Maple, Mathematica, and Matlab. It provides numerical tools (largely via SciPy), but also a wide range of tools for symbolic math. Octave's goal is somewhat narrower, being primarily to provide a Matlab-compatible numerical computation system. There's apparently a Sage interface to Octave (although I haven't tried it), in case you want to use Octave to do numerical stuff from Sage instead of using Sage's built-in numerical libraries.

[Edit: "replacement" probably wasn't the best term to use. "Substitute" might have been better. Or some other word that indicates similar functionality but not syntactic compatibility.]

Replacing Mathematica with

Replacing Mathematica with something freely available seems more of challenge. Any good alternatives (hopefully that can read Mathematica files)?

Comparable functionality, not a re-implementation

My understanding is that Sage strives to replace the aforementioned systems in the sense of offering comparable functionality, not a replacement in the sense of GNU Octave, which strives to be more-or-less a drop-in replacement for Matlab.

And, Sage tries to offer a uniform interface to Mathematica and Maple and Maxima so that these systems can be more easily compared. I don't know how big the uniform interface is, and how often you need to fall back to a CAS-specific functionality.

Evolutionary dead-ends

In many fields (e.g. machine learning), Matlab is a sort of lingua franca. Octave's major advantage is that it can run a colleague's Matlab code with almost no changes. You should really ask yourself how many people are looking for yet another numerical programming language.

Punctuated equilibrium

You should really ask yourself how many people are looking for yet another numerical programming language.

I'm not entirely certain why you addressed that question to me. I'm not a Sage developer, nor even really a Sage user (more of an occassional dabbler). However, since you've asked: anecdotally, I've observed that SciPy is becoming increasingly popular (both in academia and in industry), so the answer to your question may be "more than you think". That said, I'm personally a heavier user of Matlab than SciPy at the moment. Although I go back and forth between them (and sometimes Octave) depending on what I need to get done, Matlab clearly has a richer set of domain-specific toolboxes than SciPy at this stage. That obviously makes it more attractive for certain applications.

All of that's rather beside the point though. As I mentioned in my earlier comment, Sage isn't intended to be a direct drop-in replacement for Matlab. It's not even intended to be a numerical programming language. Sage is intended to provide a common Python-based interface to a variety of existing open-source (and some commercial) mathematical tools. Sage incorporates packages to tackle problems in (quoting from the Sage website)

...algebra, calculus, elementary to very advanced number theory, cryptography, numerical computation, commutative algebra, group theory, combinatorics, graph theory, exact linear algebra and much more.
In other words, it has a different scope than Matlab. Whether or not there's sufficient demand for that scope to enable Sage to avoid being an "evolutionary dead-end" remains to be seen.

"incorporating packages"

I have to admit I am still not entirely sure I understand the model... Is the idea to have a single language (essentially Python), which uses the APIs of all these packages under the hood, or merely to allow users to embed calls written in Mathematica, Matlab etc? I figure it's the former, which would mean that as a language Sage does directly compete (or has to keep up) with all these languages.

The Sage model

I believe that the idea is to have a single "language" that uses package APIs under the hood (although that hasn't been fully implemented yet, so there are situations where you end up making an embedded call to the underlying package). My response to Sean was focused on his question regarding the need for a new numerical programming language - the point being that Sage isn't intended to be "just" a numerical programming language (let alone a Matlab-compatible one). That's apparent just from the name, as reported in the article by Stein that Leon started this thread with:

SAGE = Software for Algebra and Geometry Experimentation

So Sage is -- or started as -- a language for algebra and geometry rather than for numerical work. In that sense I guess it's competing with packages like Mathematica. Will it actually be competitive? I don't know. I'm not a mathematician, so I'm not exactly in the target market. From a language perspective, I'd imagine that being able to use a familiar, full-featured programming language (note the various complaints about lack of features in Magma's language, or about the difficulties of GUI programming in Matlab) would be a big selling point. I gather that the non-proprietary nature of the software is also a big plus for mathematicians concerned about their freedom to conduct research. Maybe that'll be enough to tip the balance in favour of Sage.

I'm not entirely certain why

I'm not entirely certain why you addressed that question to me.

I meant it for the broader "community" of people looking at yet more dialects for numerical computation. From my understanding of your comment, Sage is intended to make "Python + some libraries" a preferred environment for numerical programming. This effort seems pointless: each community's algorithmic knowledge is far more important than some syntax. I hate Matlab's OO and string handling, but I accept both when I want to use someone's numerical code.

I don't use matlab as it is

I don't use matlab as it is not aimed at the areas that I work in, so my knowledge of it is entirely second-hand. But I do hear from people that its language is pretty awful, and a pain to work in (or around). I have played with Sage a lot, and it is a very nice environment to work in. Python is a nice language for hacking things up in, the APIs are generally quite sensible, and the notebook interface is very cool for exploratory work.

Having said that; for the domains that I work in I tended to use Magma, Maple, Graphviz, Gnuplot and lots and lots of glue. So in some sense I guess I am the target audience for Sage. And I haven't picked up either Magma or Maple in over a year. Working in Sage is definitely a good replacement for them. Matplotlib has won me over from gnuplot, and the sense of integration at having everything in one language means that Sage is far from pointless. I still write glue, but less of it, and everything being glued, and the glue itself, is in Python which makes it easier.

I don't understand your claim of pointlessness. If I accept your argument that the important thing in each community is algorithmic knowledge then you still seem to be assuming that old code is more important than new code. As a researcher I don't mind losing access to code in my domain - as you say the knowledge to recreate it is more important - but I do value an increase in productivity in writing new code. Certainly it is a different trade-off for everyone, but I spend more time working with new code, than working with old code. I can see the point in a project that makes that task easier.

RE: Evolutionary dead-ends

In my experience, the real bottleneck is that most statistical analysis is provided to scientists as "packages" that all scientists share. I can't remember who made the quip, I think it was Richard Feynman, but it goes something like "Scientists of the 19th century shared each others research methods; scientists of the 20th century shared each others FORTRAN bugs". Substitute C++ (via Fermilab/CERN Scientific Linux environment targeted towards physicists) for FORTRAN, and you've got what todays supercolliders experiments statistical analysis are built on.

There was a really nasty, but academic standard MATLAB package I had to use when doing human brain image analysis. I had to modify some of its internal subsystems so that the proper hooks were provided for what I wanted to do (I spent days just reading the code before I made any changes, for fear of introducing a bug). This project was done by some neurology major, and it morphed into a huge academic project with many contributors.

"Packages"

the real bottleneck is that most statistical analysis is provided to scientists as "packages" that all scientists share.

This is a *feature*. The idea is to create a single robust, reliable version of things that are tricky to get right (e.g. because of numerical stability), then encourage everyone to use it. I would rather have everyone share the same small set of FORTRAN bugs than have everyone fight with his own larger set of FORTRAN/C++/sage bugs.

It can be both a feature and

It can be both a feature and a bottleneck...

Sage Component List

Allan is right, Sage's goal is more ambitious than "just" replacing Matlab. Here's a pretty impressive list of the components of a Sage distribution.

Of course the best option is

Of course the best option would be to use Scheme (Clojure?!) with decent math libraries...

Incanter

New but promising, Incanter is Clojure plus math/statistical/graphing libraries: http://incanter.org/

Some of the early contributors are interested in using this on top of the scaling infrastructure that is popular in the Java world, such as Hadoop and Cascading.

Wow! This sounds very

Wow! This sounds very exciting.

Incanter is pretty cool, but

Incanter is pretty cool, but I'm a bit concerned and puzzled by it, in that Lisp-Stat failed pretty miserably. By "miserably", I don't mean that it was miserable, just that it had a lot of promise and never got anywhere.

I sort of have this impression that the statistical community has already sort of voted on their preferences for a Lisp-like statistical language. Those who experienced that period and were exposed to Lisp-Stat and R and watched R rise so prominently while Lisp-Stat was abandoned will probably be really reluctant to adopt a modern clone.

Then again, maybe R's popularity will highlight its shortcomings and lead to widespread interest in Incanter.

Back to the Future: Lisp as a Base for a Statistical Computing..


Back to the Future: Lisp as a Base for a Statistical Computing System

Essentially, the latest developments in statistical languages are that the people who founded R/SPlus now think that a Lisp-like language might be the best direction for the statistical language community.

I don't know if this really addresses your objections.

Another counter-point would be in physics research. Per Bruun recommended physicists stop coding in Fortran and move to C++, during a time period when physicists were increasingly frustrated with standard Fortran environments poor support for dynamic activation records and other neat programming language supportable tricks. (To do dynamic memory tricks, they would often use buggy third party software like Winteractive that would leak memory and was very unstable.) A lot of physicists at Fermilab and CERN dismissed Per as an old timer who didn't know what he was talking about. So he quit the committee and went on his own, and beat his dissenters back with a working implementation that addressed physicists needs while the other project stalled and never really got off the ground due to effective bike shed arguments.

....apologies if this history is a little off. Parts of the story were supplied to me by a dude who manages a supercollider, and are thus second-hand.

I guess that paper deserves

I guess that paper deserves its own thread (hint, hint)...

Paper & Lecture Slides

Take it easy with the elbows ;-)

I honestly thought it was already posted to LtU before, and right before I added it I search the archives and didn't find it.

I'm aware of that paper by

I'm aware of that paper by Ihaka and Lang. It's very interesting in a number of ways, and deserves more widespread attention.

I guess my reaction to that paper, reinforced shortly afterwards with Incanter, was that many of these issues were being discussed by Tierney awhile ago with Lisp-Stat. I thought Lisp-Stat had a lot going for it, and while it gained some traction, it lost it really readily.

So now we have Incanter, and I'm really cautious about it, because I feel like I've been there before. I don't know if people were turned off by a Lisp-style syntax in the statistical community, or really liked R, or both, but I feel like it's worth revisiting why Lisp-Stat didn't gain traction before pushing a statistical Lisp all over again, if for no reason other than to avoid the same mistakes. It seems all the more important given that (1) other viable options like Haskell, Python, or Scala (or maybe even Fortress someday?) might fill some of those gaps (whatever they might be), and (2) R is even more popular than it was during Lisp-Stat's peak.

Math systems that "understand" (or at least "enforce") math

So many here are so much more knowledgeable about type systems, Haskell, OCaml, etc., than I am. I usually choose not to comment.

But on this subject I have a strong opinion: we need math systems (a la the systems already named) that "enforce" (or "understand," in some sense) the mathematics.

The interview a short time ago with Cleve Moler, where he admitted that "Matlab has one type--the matrix" (my wording) contrast with a recent posting on his blog by Dan Piponi (sigfpe) where he talks about using the Haskell type system to build systems which "act as" (my wording) certain types of spaces, or types. ("Types are spaces," as one version of the Curry-Howard Correspondence puts it).

I'm impressed with how sufficiently typeful systems can "capture" the properties of the spaces that "quantum objects" live in, thus enforcing a lot of the physics and relieving the programmer of some of the burden of having to enforce all sorts of behavior on the one type (e.g., the matrix in MatLab, or simple arrays, or lists) he has in his language.

Several years ago I read some papers by Jerzy Karczmarczuk about the connections between lazy functional languages and quantum mechanics, and also some papers on things like (like, because there may be more than one of these!) "QML," described in this URL:

http://fop.cs.nott.ac.uk/qml/compiler/

This is really exciting, the idea that very high level languages like Haskell can be used to capture (and "enforce," in the same sense that we view "objects" in traffic simulatons or in animal hierarchies as "enforcing" certain rule of the road) properties of quantum-mechanical systems.

(Never mind that for an actual, production-oriented calculation of something like a hydrogen atom or the like one would likely use FORTRAN or C++ or MatLab or something equivalent for pure performance reasons. This is not the point here.)

I also need to mention the related work of Martin Escardo, Steven Vickers, and others. The idea of building structures which behave as topological spaces ("types are spaces") is quite exciting. After all, even in lower-level languages with arrays and vectors, we are essentially using these languages to properly "manage" linear algebra objects. Ultra high level languages are just raising the bar.

So that's what I hope to see in my lifetime, systems which allow all of the known mathematics we have to be as well adapted into software systems as we can do. (Sorry about the phrasing...I suppose I'm talking about "worlds made out of math."

For me, having had horrible experiences with FORTRAN IV (40 years ago), then Pascal, then a smattering of C, then LISP (on a Symbolics 3670), then retrograde motion with Mathematica, the power of Haskell is just a revelation.

I can't wait to see what's next.

--Tim May

"Scientific computing"

My informal observations indicate that scientists will use the tool that is (a) most widely used around them and (b) easiest to get their hands on. I guess that's not a revelation to anyone... As a consequence different cliques/disciplines use different tools. Most of these tools are in the Matlab genre rather than the Haskell genre (if you get my drift). The more DSLs are used, the better chance there is to retrofit types systems and other support tools. This is one reason why the adoption of R is encouraging. Other areas, based on anecdotal evidence, are doing worse.

I agree, most will use the

I agree, most will use the tools they find ready to hand, around them, taught in their classes, etc.

But it's important that some fraction use tools that are actually more useful, even if less common.

A couple of years ago (it seems) I mentioned that at my company a very uncommon programming language was adopted for design. This was "MainSAIL" (or maybe "MainSail"), a language used for CAD of ICs that was based on the SAIL language out of the Stanford AI Lab, ergo the name.

Obscure, yes. But it probably gave my company a few years' lead over companies still struggling with FORTRAN or even ALGOL or (the still primitive) Pascal or the (dead on arrival) PL-1.

Intel used it from the 70s through the 80s. (I wouldn't be surprised if it wasn't still being used in pockets into the 90s.)

I think some recent examples where companies used Python or Ruby for "social apps" are also examples of this.

Not sure of the PLT connection, so I'll stop here. But it says that even obscure languages have places to shine, sometimes not clear to the outside world.

For the record, I think that

For the record, I think that in many cases their choice of tools is a reasonable pragmatic decision, even if personally I prefer cooler packages.

Edited to add: The following quote from the article is an example of issues I had in mind: I then realized that if I did this, I would have to do it by myself, since almost everybody I knew used Magma, and would consider my plan too difficult and pointless. I wouldn’t get to do number theory for years. My spirit broke.

Type programming can model abstract math & don't underrate C++

Pragmatically, if I wanted something like Sage, I'd work on extending Macsyma into what I wanted - it's extremely powerful and also open source. On the other hand, a great test for a modern declarative language would be whether it was a good target for a clean rewrite of Macsyma. Personally, I'd pick Curry over Haskell for such an endeavor, but even C++ would IMNSHO be far superior to Python because of the former's declarative metaprogramming features.

There have long been C++ template metaprograms which model the mathematical properties of expressions at compile time to do things like dimensional analysis, number theory, partial evaluation, etc. Then you've got boost and blitz++ to do any purely numerical work.

I'd love to see some of my favorite purely declarative languages develop to where they could compete in every way with procedural languages. I'm always saddened to hear people choosing procedural languages because they think that modern languages are too exotic!

_Greg

Oops, I meant Maxima, not Macsyma!

Too much history there!

_Greg

Application

I just used Sage in a blog posting.
http://mobjectivist.blogspot.com/2009/12/monte-carlo-of-dispersive-discoveryoil.html
Interesting to see how it gets applied.

I was doing mainly Monte Carlo sims so I actually ran the majority of the trials in a compiled language, of course in one of Ehud's favorites -- Ada.

My next encounter with

My next encounter with mathematics software was in early 1994 when I became a mathematics major, after accidentally encountering an abstract algebra book misfiled under computer science in a used bookstore, and being instantly mesmerized by ideas such as groups, rings, and fields.

This I like...

Another interesting quote: I

Another interesting quote:

I asked Cannon why it was so far behind, and he explained that the grants he was able to secure simply wouldn’t pay for language design.

I leave it open how this applies to current efforts.

I leave it open how this

I leave it open how this applies to current efforts.

By "language design" Cannon meant the core programming language, including features such as "user defined types", "exception handling", etc. Cannon's remark is relevant to Sage in at least two ways:

  1. For the Sage project, for the most part language design in the above sense is done via the Python community. Because Python is a general purpose widely used language, it receives strong financial support from industry, including Google, which heavily uses Python. If Sage developers want to do language work that impacts Sage, they do it through improving Python (or Cython). This means that the funding constraints Cannon had for Magma do not apply, since the range of funding sources for Python are much broader. (Sage does have a minimal preparser, but that's trivial in comparison to the serious language design issues like I was discussing with Cannon.)
  2. The Sage project is currently organized in such a way that it is not fundamentally dependent on grant funding. Most people (and there are about 200) who work on Sage receive no financial compensation for their efforts -- they work on Sage because they very strongly believe in the value of the project and find it personally useful for their work. For example, our current amazing release manager, Mike Hansen, has been working on Sage for the last 6 months as a volunteer, and our main technical editor, Minh Nguyen, is an undergraduate at the University of Melbourne who is also currently not paid to work on Sage. As a community we can do whatever we feel should be done, without being beholden to funding agencies (or paying customers). That said, we love to get user feedback, and we have received financial support (from Google, Sun, NSF, Microsoft, DoD, etc.), which we're very, very grateful for--it just isn't something we've come to depend on.

Welcome to LtU! Thanks for

Welcome to LtU!

Thanks for the insider view on Sage and funding. These are interesting and important points. My intent was to focus on those cases that do require language engineering, that is the cases in which the features of the general purpose language (if the domain specific language is embedded, as in Sage) are not enough. This may be relevant to Sage (as I mentioned above, I am unsure about the semantic model), if you need domain specific extensions to core python semantics. My comment was meant to hint to other projects as well, in which a domain specific language, embedded or otherwise, may require language design work to evolve (e.g., R, and the discussion here).