Towards a New Directive-based Tasking API for Distributed Systems

Journal Title
Journal ISSN
Volume Title

Programming for large-scale computing requires programming models carefully designed for that purpose. MPI is often the model of choice for distributed systems, but writing MPI program is time-consuming and complicated to maintain and debug as the program size gets larger. Moreover, MPI does not exploit some of the potential benefits of shared memory systems. Using a hybrid model also requires a high level of programmer expertise. Designing algorithms in terms of tasks potentially reduces the development effort and has many performance-related advantages. In addition, directive-based programming styles have made parallel programming and migration of serial code to multicore chips easier than ever. Although directive-based tasking models have paved the way to distributed systems, they still lack capabilities necessary for efficient large-scale computing.

TagHit is an API proposed by the HPCTools group in the Department of Computer Science at the University of Houston. Targeted for exascale computing, TagHit combines the benefits of task-based programming models with the simplicity of directive-based programming styles. This thesis tackles task creation and scheduling in TagHit. First, I present an overview of six existing task-based programming models. Next, I propose an experimental runtime design of TagHit's task creation and scheduling modules and then describe in detail a prototype implementation of the runtime. The goal of this work is to guide the definition of TagHit's concept and semantics and to assess the implementation cost and challenges of creating and scheduling tasks in TagHit. Finally, I present two TagHit benchmarks with results that show the design and implementation have supported the general concept of TagHit with good speedup and scheduling behavior.

Parallel, Distributed, Shared Memory, Distributed systems, Distributed memory, Directive-based, MPI, API, Tasking, Task-based, Task Scheduling, Work-stealing, Exascale, Large-scale computing, Computing systems, Programming