|dc.description.abstract||In order to improve its expressivity with respect to unstructured parallelism, OpenMP 3.0 introduced the concept of tasks, independent units of work that may be dynam- ically scheduled and hence support efficient load-balancing. Task synchronization is primarily accomplished via the insertion of taskwait and barrier constructs. How- ever, these are global synchronizations and may incur significant overhead on large platforms. The performance of certain algorithms may benefit substantially if finer grained synchronization mechanisms were available. In this thesis, we extend the OpenMP tasking model to allow point-to-point synchronization among tasks in an OpenMP program. Such an approach enables us to provide support for a dataflow model within OpenMP.
We propose language extensions to the current OpenMP task directive that en- able the specification of task-level granularity for synchronization of asynchronous tasks sharing the same parent. A task waits only until the explicit dependencies as specified by the programmer are satisfied, thereby avoiding the use of expensive global synchronization points. The extensions are simple to use and promise an increase in the achievable concurrency for some parallel algorithms.
We have implemented our ideas fully within the OpenUH OpenMP runtime li- brary. The application of the extensions on two algorithms, LU Decomposition and Smith-Waterman, demonstrated significant performance improvement over the stan- dard tasking versions of the two algorithms using the GNU, Intel, OpenUH, PGI, Oracle/Sun, and the Mercurium compilers. We compared our results with those ob- tained using related dataflow models - OmpSs and QUARK, and observed that the versions using our task extensions obtained an average speedup of 2 - 6X.||