LLVM 1.5 has been released!

Hi All,

This is a quick note to say that LLVM 1.5 is out, with many new features above and beyond LLVM 1.4. Perhaps the biggest feature of interest to this community is full support for proper tail calls, as described in the release notes. More details about the release can be found in the release notes and in the two status updates [1,2] since 1.4.

For those not familiar with LLVM, it is a compiler system that can be used to build a wide variety of compiler and language systems. It provides language- and target-independent tools for building static compilers, interprocedural optimizers, JITs, etc. If you are working on a new language and need a code generator, you should check it out. LLVM can be used in two ways: 1) link to the libraries we provide for direct access to the APIs. or 2) emit the LLVM IR as a text file. People have even written (toy) languages in perl using the second technique.

-Chris

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Congrats

Good work, Chris.

so sweet

llvm is just so sweet man

Thanks

:)

C--

How does this compare to C--?

RE: C-- vs LLVM

I'm not a expert on C--, but my impressing is that LLVM is far more developed, generates better code, supports JIT compilation, more targets are supported, has midlevel and interprocedural optimizations, has C/C++ front-ends, and has a much larger developer base (so it moves faster).

I'm sure that there are advantages to C-- over LLVM, perhaps someone else can comment about that.

-Chris

Very Good!

Having spent some time looking at both C-- and LLVM, I have to say that you've done an excellent job of summarizing LLVM's advantages, and with the 1.5 release I'll now have to take a new look at it.

The principle benefit that I see to C-- is the one that Peyton-Jones and Ramsey claim for it: it has a well-defined "runtime interface" that allows you to provide runtime systems, e.g. garbage collection, exception support, etc. that are tuned specifically to your language. However, once again to be fair, I haven't spent nearly enough time examining both systems in parallel to be confident in making specific claims of advantage here. It's probably about time to attempt a fair evaluation.

Preliminary Observations

  1. C-- current requires a fair amount of infrastructure just to get going. On my almost-virgin Mac OS X 10.4.1 installation, I had to install Gerben Wierda's teTeX distribution and U of Arizona's Icon distribution in order to install Norman Ramsey's noweb distribution. I also had to install Lua. C-- wants 4.0, but that's a full 1.0.2 revisions behind, so I elected to do whatever I needed to do to get C-- to build with 5.0.2. Oh, and C-- also wants mk rather than make or omake, but it includes that in the distro. Anyway, with all of that taken care of, building C-- is almost straightforward with just a single edit to a single file to get Lua 5.0.2 working. However, it still fails to build for me due to a missing qc--interp.h file, which I can see is supposed to be generated from qc--interp.nw, but there's no hint of a rule to actually do such generation anywhere in the distro. "The distro" in this case is an rsync pull as of a few days ago. Not the most auspicious beginning.
  2. Attempting to build LLVM 1.5 in the same environment ran into similar, but much more minor, snags: ultimately I found it necessary to rebuild the cfrontend from source on Mac OS X 10.4.1. Following the instructions, this didn't prove to be problematic, and the remainder of the build and installation process went fine for me.
  3. I guess I don't quite understand what "make test" in $OBJDIR is supposed to do—for me, it shows me how it's temporarily reset my PATH to include a Scripts directory, and then says "true runtests" or somesuch, which, needless to say, does nothing.
  4. The distinct "llvm-test" module seems quite comprehensive. Unfortunately, it also appears to want a huge pile of infrastructure—"Expect" and all that—and I quickly abandoned any thought of actually running the suite.
  5. I see from the release notes that proper tail-call support is only optionally available for x86. Feh.
  6. Both C-- and LLVM now claim to be suited for precise GC, arbitrary exception semantics, etc. That is, the uniqueness claims that I believed C-- to be making might not be as unique as I'd been led to understand, so this investigation could be quite fruitful, once I get C-- to build and start to get my head around developing with both tools.
  7. LLVM appears to have a huge documentation and community advantage over C--.
  8. Anton had a fairly specific set of issues relative to LLVM in the past. It would be nice to hear how 1.5 does or does not address those issues. stipulating that you can enable tail-call support for x86 and I trust that it will migrate to the other targets as well.

Generic/Gimple vs. LLVM

How do LLVM's intermediate representation vs. Generic/Gimple representations of GCC 4.0 compare?

RE: GENERIC/GIMPLE vs LLVM

First off, GENERIC is really an AST version of GIMPLE with a few minor twists. GIMPLE is like LLVM in the sense that it is a linearized 3-address-code-like representation.

The primary differences are these:

1. GIMPLE is only an in-memory IR, there is no way to write the IR out and read it back in. This means that you HAVE to write your front-end in the GCC framework and link to GCC. This is a real pain because GCC is not modular, and is intentionally designed to make it difficult to be so.

2. The LLVM IR is a well defined representation, whereas the GIMPLE representation is not. Even most GCC devs do not understand the full semantics of the GCC IR: there are many weird cases, special flags, and other things hanging around for historical reasons. The GCC IR is also notoriously non-type-safe and difficult to deal with.

3. LLVM offers many things that GCC/GIMPLE don't, e.g. true tail calls, JIT compilation, link-time optimization, an automatic debugger for optimization/codegen bugs, etc etc.

4. OTOH, GCC can generate code for far more targets than LLVM can. With LLVM you can use all of GCC's targets through the use of the LLVM C-backend, but this is slower than direct codegen, and you also obviously lost tail calls.

There are obviously many many differences between the two systems, but hopefully this will give a flavor for some of them.

-Chris

re: Generic/Gimple vs. LLVM

Thanks! I was wondering why I couldn't find more information on Generic/Gimple...I thought I didn't know enough about this stuff to search for it intelligently...glad to know I'm not the only one.

I also noticed your project came up often when I was googling for "Typed Assembly Language." Hopefully I'll learn more about it as I experiment with LLVM.

Lastly, I understand that LLVM targets C to gain access to a larger number of machines. Couldn't LLVM's intermediate representation target Generic/Gimple directly (by pass C and its limitation completely) and gain access to all the architectures GCC has access to (and may be even gain further optimization on their SSA) ? I remember reading on slashdot there might be some licensing issues with this approach?

any way, this project looks nice, thanks!

falcon

Why not LLVM -> GIMPLE

The basic answer is that it provides nothing that going to C doesn't. At the same time, it would limit us to using GCC, not allowing ICC or SGICC or many other vendor compilers.

Also, LLVM has its own aggressive SSA optimizers and GCC provides none of the aggressive interprocedural/linktime optimizers that LLVM does. Really the only reason to go through GCC is for portability.

-Chris

gcc ssa more general than C

My understanding was that GCC's SSA representation is (by definition?) more general and does not have C's restrictions such as tail call whatever, multiple return values, etc.

RE: gcc ssa more general than C

It is slightly more general than C, designed to handle ADA constructs for example, but not in a way that can handle tail calls, multiple return values, or any significant extension. The GCC list can probably help answer any more questions about their IR though.

-Chris

GCC intentional obfuscation

GCC is not modular, and is intentionally designed to make it difficult to be so.

I have seen this before, but I don't remember where; can you give us a cite?

FSF policy

One of the pointers: A threat to Free Software?.

In short, making GCC modular would change the status of your extensions from "derived work" to just "communication partner", preventing any FSF control (meaning your extensions might be even proprietary, gasp).

PLTers for lawyers

I think it's high time for PLT people to clarify notions of "source code" vs. "object code", "linked" vs. "communicating", etc., as these are currently defined and used in courts in an inconsistent manner (probably, caused by the fact that FSF started its work during C era).

Mobile code, metaprogramming, partial evaluation, proof-carrying code, even trivial notion of bytecode - how should they be treated by GPL?

Re: FSF policy

Fascinating! I noticed similar logic being used in Stallman's article Why you shouldn't use the Library GPL for your next library:
when a library provides a significant unique capability, like GNU Readline, that's a horse of a different color. The Readline library implements input editing and history for interactive programs, and that's a facility not generally available elsewhere. Releasing it under the GPL and limiting its use to free programs gives our community a real boost. At least one application program is free software today specifically because that was necessary for using Readline.

In the case of Readline, it's just a question of license choice, with the usual restrictions on the ability of Free Software to be exploited by non-Free Software. The choice of the GPL over the LGPL doesn't affect someone reusing this code in a Free Software project.

The case with GCC is quite different, although it has the same competitive motivation. The problem is that for GCC, the competitive advantage for Free Software is created by imposing technical constraints, which affect both Free and non-Free software projects. This puts the FSF in a similar position to any commercial organization which imposes deliberate restrictions on its code for competitive or economic reasons. It reduces the user's freedom to use the code, according to the FSF's own principles of freedom. It discourages or prevents certain otherwise reasonable uses of the code, even in Free Software projects.

Stallman's argument is that it's OK to violate the FSF's own principles in this respect, because the greater good of Free Software in general is served. However, that seems to me an arguable conjecture. The case of the GCC Introspector is a nice example of the problem. I wonder if, in theoretical twist of irony, the FSF could attempt to use the DMCA's anti-circumvention provisions to go after a project like Introspector, which could be said to circumvent GCC's technical protection measures.

P.S. Sorry for the off-topic post. LLVM 1.5's tail call support sounds cool. As does its lack of technical measures preventing reuse. ;)

Not only off-topic

I wonder if, in theoretical twist of irony, the FSF could attempt to use the DMCA's anti-circumvention provisions to go after a project like Introspector, which could be said to circumvent GCC's technical protection measures.

This is not only of-topic but also inflammatory.

Stallman's argument is that it's OK to violate the FSF's own principles in this respect...

I don't think this statement is true. If you're sincerely interested in this question there are a lot of places on the Web to find more information. I think the basis of the discussion is that there is no single definition of "the user's freedom to use the code."

GCC as a reusable resource

I think the basis of the discussion is that there is no single definition of "the user's freedom to use the code."

True, there's no such single definition. However, the FSF has such a definition, laid out in the GPL. It seems to me that aspects of that particular definition of freedom are in fact being inhibited by technical means in GCC. My "theoretical twist of irony" was intended to highlight this apparent contradiction. It's theoretical in the sense that the FSF presumably wouldn't take such action. However, the fact that they may be in a legal position to do so is a result of their own choice of policies. I pointed out the consequences of taking this contradiction to an extreme.

The sense in which I considered my post off-topic is that it wasn't about LLVM. It's still somewhat relevant, though, related to the point that LLVM apparently provides a benefit over GCC, in terms of the modularity and reusability of its IR.

In a more general sense, the technical aspects of this issue are certainly relevant to LtU. We've had various stories about GCC, such as GCC Wiki, GCC 3.0: The State of the Source, and Compilation of Functional Programming Languages using GCC -- Tail Calls. As the latter story points out, the Glasgow Haskell Compiler relies on GCC. GCC is obviously a resource which other language or tool implementors may be interested in using, in various ways. The fact that certain uses of GCC have been technically restricted is relevant information here. If we're going to discuss it any further, perhaps we should open a new forum thread for it.

gcc has a free software boost

you have to consider that the gcc effort has historically took advantage of the strange idea of non-modularity.
For example, if gcc has an ObjectiveC compiler is because apple/next could not took it for themselves.

The FSF ideas are debatable, but in this case they seem to have been successfull.