Gabriel, Edgar2014-04-052014-04-05December 22013-12http://hdl.handle.net/10657/595The increasing number of cores per node has propelled the performance of leadershipscale systems from teraflops to petaflops. On the other hand, bandwidth of I/O subsystems have almost been stagnant. This has created a huge gap between the computation and I/O time, making I/O a major bottleneck. Furthermore, the realized I/O bandwidth in such systems is in general far lower compared to the theoretical peak bandwidth. The Message Passing Interface (MPI) has been the de facto standard for parallel computing in the past couple of decades. MPI-I/O, which is a part of the MPI specification, not only offers a clean approach to access the file system from the application but also acts as a middle-ware between the application and the file system to specify a variety of enhancements. Specifically, collective I/O has proven to be very effective for I/O in large scale systems and helps to bridge the gap between the theoretical and sustained I/O bandwidth. This dissertation aims at developing approaches to improve parallel I/O at this level. In particular, this dissertation provides methods to utilize data-layout-aware rank assignment to improve I/O performance, overlap collective I/O with computation and finally use the principles of collective I/O on staging based I/O architectures.application/pdfengThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).Parallel I/OHigh performance computingComputer scienceCollective I/ODelegate I/OComputer scienceOn Scalable Collective I/O for High Performance Computing2014-04-05Thesisborn digital