archives

What next?

Hello Everyone,
Two questions from a person with too much time on his hands:
1) What is the next (LtU-related) book/paper/etc you would read, if you had the chance?
2) What introductory material would you recommend to a layman interesting in learning more about the topic from the book mentioned in question 1?

Just curious,
-Ryan

Edit: Generalized question from "books" to "books, papers, or similar things".

automatic program parallelization for multicore cpus as a software problem

I just came across an article regarding Intel having massive multicore CPUs ready in the near future. This kind of CPUs will require a fundamental change of how we write programs, or so they say (there are plenty of discussions around about this).

But do we really have to change the way we write software? I wonder if there can be a part on the chip that automatically makes different threads of execution out of a single thread, by correlating data dependencies at run-time. Just like branch prediction, a similar piece of logic could be used to 'predict' data correlations, and thus separating the instruction stream into different threads, as if the code was multithreaded. A special lookaside buffer could be used to prevent simultaneous access to memory locations: since data are always fetched from the cache, the cache itself could contain flags that can be used to monitor simultaneous access. The CPU would catch the simultaneous access event and modify its threads and prediction statistics accordingly, so as that next time the simultaneous access is avoided.

I am asking LtU for this because I think the problem is essentially a software problem. In my mind, an instruction always targets a memory location (either directly or indirectly through registers), so the CPU could monitor dependent instructions and separate them into different threads of execution. The mechanism could be extended to registers, since registers are essentially memory locations inside the CPU.

Assuming that dependency tracking could take place in run-time, what programming language semantics are required in order to help the hardware run the software more efficiently? Is the C semantics enough? For example, Fortran loops can be automatically vectorized because they do not contain pointer aliases.

Finally, are purely functional programming languages better, on the semantics level, for automatic parallelization? on the surface, it certainly seems so, but what about languages that are translated to C (or equivalent) code? aren't those programs under the same constraints as C? would we have to eliminate C as the middleman and encode parallelization tips (coming straight from the FP semantics) directly in the instruction stream?