Compiler Cost Model for Multicore Architectures



Journal Title

Journal ISSN

Volume Title



The intention to move from single core to multicore architectures has been to increase the performance of a system and hence increase the performance of an application. However, obtaining the optimal application performance on multicore architectures is found to be not that trivial and still remains as unsolved problem due to the multiple challenges the multicore architectures face. The main reason for all the challenges that the multicore systems face is the inability to utilize the system resources well enough. Ineffective utilization or poor coordination of resources may create performance bottlenecks and overheads on the system that ultimately affects the overall performance of an application. We have identified three main causes of performance degradation on multicore architectures; these are false sharing, memory bandwidth, and shared last level cache contention. Knowing the degree to which an application performance would degrade due to these three issues would give an idea to an application programmer or compiler as to which code transformation is needed in order to decrease this negative performance impact. Unfortunately, the current state-of-the-art compilers such as Open64 and GNU are oblivious to these performance bottlenecks stated above. Even though these compilers, especially Open64, have a very robust optimization and code transformation phases, they are all limited to sequential programs and simple architectures with single processor units. This limitation makes their optimization phases less accurate on multicore architectures. In order to improve compilers' code transformation and optimization phases, compilers' cost models that guide optimizations should be extended to consider these performance bottlenecks that can occur on multicore architectures. Therefore, the goal of this dissertation is to develop compile time models that quantitatively estimate the impact caused from these three performance degrading bottlenecks to the overall application performance, and that can be used as extensions to the existing compilers' cost models when guiding certain optimizations and/or code transformations targeting multicore architectures.



Compilers, Cost model, False sharing, Memory bandwidth, Shared cache contention