FPGA CPUs

This might be a little bit off topic, as it's hardware related, but radical changes in hardware have a way of trickling up to the software. After listening to Ted Neward's keynote about the future of languages, I had a series of thoughts.

1. We really do have fundamental issues related to multi-core concurrency, distributed programming, and (by implication) emergent behaviors of our software systems.
2. FPGAs are dynamic hardware
3. You could, in theory bundle a CPU with an FPGA so they sit on the same die
4. FPGAs are useful for truly running a Virtual Machine (rather than simulating one on the CPU)
5. This would be incredibly powerful, if, as Ted implied, DSLs expand in popularity, we should use the hardware itself as a VM.

One of the most immediate benefits I could envision was the ability to run a webserver in conjunction with an interpreted language at much faster speeds. Or, in high-performance computing, making a multi-core pipeline specific for your research simulation. I'm sure that you all can think of many others; I'm sure it's one of those technological marriages that spawn unexpectedly useful offspring.

Is this idea out there? Does it have a name? What researchers have explored this area? Are FPGAs currently useful for such applications? How long will we have to wait?

Obviously we'll need a DSL just to program the thing (as if multi-core alone wasn't hard enough), and with it, a good silicon compiler.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

It's not that easy

Have you actually implemented a VM on an FPGA? It's not as easy as you make it sound. First, it's a fair effort to implement it at all. Second, to make it run as fast as a well written interpreter in a conventional CPU is quite challenging. Remember that an FPGA will probably run at about 1/10 of the clock speed of your regular CPU.

Personally I'd love to see an FPGA bundled with the CPU, but I think using it to gain performance on general purpose tasks is going to be difficult.

Reduceron

There is work done on this, but I don't see any online copy of the article:


Matthew Naylor, Colin Runciman
The Reduceron: Widening the von Neumann Bottleneck for Graph Reduction using an FPGA, IFL 2007

If I remember it correctly, the FPGA they are using can do 140 Mhz in theory, but their current implementation runs at 96 Mhz. There was plenty of gates left for logic, but they were fairly constrained on memory. Apparently it compared quite well to a P4 at 2.8 Ghz with respect to speed, but only small programs were possible due memory limitations.

A draft version of the paper

A draft version of the paper seems to be available on the page of the project.

Not just a theoretical possibility

You could, in theory bundle a CPU with an FPGA so they sit on the same die

That isn't just a theoretical possibility. It's something that's available now. Hybrid CPU/FPGA chips are available from several vendors, although their target market tends to be embedded systems rather than desktop products. One example of a hybrid chip is Xilinx's Virtex II, which includes several PowerPC cores alongside an FPGA.

To bring things back to a slightly more LtU-oriented focus, a nice summary of some of the issues surrounding programming models for such chips is Andrews et al., Programming Models for Hybrid CPU/FPGA Chips, IEEE Computer, January 2004. Within it you'll find references to languages and language tools such as Berkeley's Ptolemy II, Rosetta, SystemC, and HandelC.

It's been out there for a long while

The idea is called Reconfigurable Computing. It's been explored using FPGA's and via other means (e.g., in my graduate advisor's research project).

Remarks

- If it is hardware, it is not a 'virtual' machine. You can actually implement a processor and its peripherals in a FPGA but it won't meet the performances of hardwired ASIC chips. It can be interesting for obsolete chips, for example 68000-based computers.

- There are plenty of Domain Specific Processors out there : DSPs variants for signal processing, 3D renderers, network processors for TCP/IP offloading, Crypto processors, Java engines... Maybe we will have hardwired XML parsers/generators for webservers. FPGAs are great experimentation platforms for original CPU designs as well, you can build you own -real- Lisp or Smalltalk machine, with hardware garbage collector, arbitrary precision arithmetics...

- Most 'interpreted languages' ( i.e. Javascript, Python, PHP... ) could probably run on the same kind of processor (like the Java coprocessor found in some IBM computers), so that no dynamic reconfiguration is required except for fixing bugs and optimisation. Once your design is 'perfect' and if you have some volume production, going to ASIC will raise performances and lower the cost. If a language is very slow (for example Ruby ;-), quite often it is not really hardware's fault.

- Current 'mainstream' FPGAs are not appropriate for fast dynamic configuration, for many reasons. There is no easy separation between blocks so that partial configuration is nearly impossible, like a multi-megabytes statically linked executable. Downloading a large bitstream can take hundreds of milliseconds, the configuration cannot be paged in and out like RAM during task switchings. Neural networks can use FPGA principles, like programmable connections, but actual FPGAs are not well suited for that.

- Some high-level tools enable to describe logic designs with software programming languages, for example plain C. It is great for complex signal processing algorithms, for example. Your research simulation example is perfectly valid.

- There is a difference between considering the FPGA as 'the program' or as 'the processor which executes the program'. That difference can be blurry with microcoded chips or considering that a CPU is a finite state machine and a single thread software is also a finite state machine.

RAMP project

UC Berkeley has a relatively new project called "RAMP" which is about using FPGAs as a bridge between simulating future processors (so slow that you can't run practical applications on them) and building them (by which time they aren't "future" anymore ;-P ). It's especially relevant considering the multicore revolution -- it's important to have a platform both for experimenting with hardware possibilities, and for learning how to write high-performance software on potential future architectures.

RAMP isn't so much about performance as simulation -- the idea isn't to make the future processors, but rather to "simulate" them in hardware so that programs can run reasonably fast on them. That way, you can get performance data for real-world applications. But I think they might find the "VM in hardware" thing interesting. Wasn't that what Sun had originally intended to do with the Java VM on embedded devices?

CPU + FPGA = GPU

Hi lennart --

The "CPU + FPGA" configuration already exists, in which the FGPA is a GPU. Only recently has the GPU part become easy to program, though. ATI had planned a release of a new API, but NVIDIA beat them to it with CUDA (which reminds me a bit of Brook).

That's very inaccurate.

The latest GPUs are indeed programmable, but by no means do you change the gates at the hardware level. Indeed, current GPUs should more precisely be described as an arrangement of minimalistic processing cores, scheduled using a hardware load-balancing scheme. Another interesting thing about these processing cores, is that in addition to supporting L1/L2 and sometimes L3 caches, they also support hardware-accelerated texture lookups.

Yes, that's right

Yes, that's right, I should have thought carefully about the difference between GPUs and FPGAs. Thank you for pointing that out :-)