C++ AMP - Accelerated Massive Parallelism

Yesterday, Microsoft announced a new approach for GPGPU, APU, Multi/Many-core programming in modern C++. C++ AMP (Accelerated Massive Parallelism) consists of a few language extensions plus associated library. It builds on top of DirectX and PPL. C++ AMP is an open specification.

Details are here. As an aside, C++ AMP relies heavily on lambdas, which are part of C++11 (was C++0x). Lambda the ultimate...

Herb Sutter's Keynote on the subject at this week's AMD Fusion Developer Summit

Daniel Moth's deep(er) dive at the AMD Fusion event

What do you think about the approach taken here?

C

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

i like it modulo some things

i've read some of the posts and have been watching one of the video presentations. if i were a c++ programmer who was happy to be stuck on microsoft tools, then i guess this would strike me as the bee's knees. it apparently has good integration with visual studio, with debugger views that are about the semantics of the libraries (e.g. show you task groups not just underlying threads), it has hardware abstraction via direct x, it has buy-in from intel, yadda yadda yadda.

for me in reality it makes me wonder (a) what more fpish languages have this stuff in a decently performant way already? and/or (b) what fpish languages could make use of this by running on top of their "concurrency runtime" layer?

(it cracked me up in a sad way that in the video i was watching the guy said something like oh we have both kinds, country /and/ western when he said oh we want to support c++ and c#, we're supporting all developers.)

furthermore, especially after having just watched the Odersky Scala concurrency thing i posted, and having used Clojure, it is interesting that i didn't notice any mention of immutability. it is still the same old everybody banging on the data at the same time with a prayer approach, just perhaps raised up a notch? i guess that is simply what you have to do when you are embracing and extending c++. (well, and overall i think if you are dealing with tons of data, and aren't into gc, and want real performance, then you are likely still today to come down on the side of mangling the same location in memory ever over, rather than doing 'persistent' data structures. i wonder what the real-world spread in performance is?)

furthermore, especially

furthermore, especially after having just watched the Odersky Scala concurrency thing i posted, and having used Clojure, it is interesting that i didn't notice any mention of immutability. it is still the same old everybody banging on the data at the same time with a prayer approach, just perhaps raised up a notch? i guess that is simply what you have to do when you are embracing and extending c++.

Immutability has never (or not yet*) played a role in high-performance parallel computing. Restrictions on how memory accessed, yes, but that is imposed by the hardware rather than the programming model. Concurrency is really a different beast from parallel computing, even if they look the same on a quick glance.

It's an open specification...

There is no vendor lock-in... C++ AMP is an open specification. This means any compiler vendor is free to implement it and incorporate it into their toolchains. Herb began his keynote with "we know not everyone uses Visual Studio...". I guess old perceptions are hard to change.

Herb says:

The main reasons we decided to build a new model is that we believe there needs to be a single model that has all of the following attributes:

C++, not C: It should leverage C++’s power for strong abstraction without sacrificing performance, not just be a dialect of C.

Mainstream: It should be programmable by millions of developers, not just by a priesthood. Litmus test: Is the Hello World parallel GPU program a page and half, or a couple of lines?

Minimal: It adds just one general-purpose language extension that addresses not only the immediate problem (dealing with cores that can’t support full C++) but many others. With the right general-purpose extension, the rest can be done as just a library.

Portable: It allows shipping a single EXE that can use any combination of GPU vendors’ hardware. The initial implementation uses DirectCompute and supports all devices that are DX11 capable; DirectCompute is just an implementation detail of the first release, and the model can (and I expect will) be implemented to directly talk to any interesting hardware.

General and future-proof: The initial release will focus on GPU computing, but it’s intended to enable people to write code for the GPU in a way that in the future we can recompile with few or no changes to spread across any and all accessible compute cores, including ones in the cloud.

Open: I mentioned that Microsoft intends to make the C++ AMP specification open, and encourages its implementation on other C++ compilers for any hardware or OS target. AMD announced that they will implement C++ AMP in their FSA reference compiler. NVidia also announced support.

At 54:05 in Herb's keynote, He says "Microsoft intends to make C++ AMP an open specification that any compiler can implement. And we're working with our hardware partners to help them to build C++ AMP into C++ compilers for any hardware target, for any operating system target they want. We're helping them. And we're also pleased to announce that one of those is AMD, that AMD will be implementing C++ AMP in their FSA reference compiler for Windows and non-Windows platforms."

restrict targets

Must “restrict” targets be built into the compiler, or is there some way to write them as libraries?

Compiler

restrict has to be implemented as part of the language.