User loginNavigation |
archivesWhat next?Hello Everyone, Just curious, Edit: Generalized question from "books" to "books, papers, or similar things". automatic program parallelization for multicore cpus as a software problemI just came across an article regarding Intel having massive multicore CPUs ready in the near future. This kind of CPUs will require a fundamental change of how we write programs, or so they say (there are plenty of discussions around about this). But do we really have to change the way we write software? I wonder if there can be a part on the chip that automatically makes different threads of execution out of a single thread, by correlating data dependencies at run-time. Just like branch prediction, a similar piece of logic could be used to 'predict' data correlations, and thus separating the instruction stream into different threads, as if the code was multithreaded. A special lookaside buffer could be used to prevent simultaneous access to memory locations: since data are always fetched from the cache, the cache itself could contain flags that can be used to monitor simultaneous access. The CPU would catch the simultaneous access event and modify its threads and prediction statistics accordingly, so as that next time the simultaneous access is avoided. I am asking LtU for this because I think the problem is essentially a software problem. In my mind, an instruction always targets a memory location (either directly or indirectly through registers), so the CPU could monitor dependent instructions and separate them into different threads of execution. The mechanism could be extended to registers, since registers are essentially memory locations inside the CPU. Assuming that dependency tracking could take place in run-time, what programming language semantics are required in order to help the hardware run the software more efficiently? Is the C semantics enough? For example, Fortran loops can be automatically vectorized because they do not contain pointer aliases. Finally, are purely functional programming languages better, on the semantics level, for automatic parallelization? on the surface, it certainly seems so, but what about languages that are translated to C (or equivalent) code? aren't those programs under the same constraints as C? would we have to eliminate C as the middleman and encode parallelization tips (coming straight from the FP semantics) directly in the instruction stream? |
Browse archivesActive forum topics |
Recent comments
22 weeks 23 hours ago
22 weeks 1 day ago
22 weeks 1 day ago
44 weeks 2 days ago
48 weeks 4 days ago
50 weeks 1 day ago
50 weeks 1 day ago
1 year 5 days ago
1 year 5 weeks ago
1 year 5 weeks ago