ACCELERATING DATA-INTENSIVE COMPUTATIONS THROUGH DYNAMIC NETWORK TRAFFIC OPTIMIZATION
Hadoop has been emerging as a popular distributed framework for data intensive computing in clustered environments. The main usage has been in parallel computing problems where interconnected clusters would transfer parts of the data between individual compute nodes to accomplish one job. The clusters are usually connected with shared network infrastructure where other applications also access and transfer on the same bandwidth. Specifi cally, Hadoop MapReduce jobs su ffer when running in parallel with other tra ffic in the underlying network due to their sensitivity to delay between compute phases. We propose a dynamic priority mechanism realized by OpenFlow protocol on such an infrastructure with a preferred QoS policy over all other tra ffic. Moreover, our proposed priority mechanism can be enhanced if additional network information on traffi c in the underlying network is provided. We propose to use the emerging ALTO (Application Layer Tra ffic Optimization) server to provide network tra ffic information to Hadoop. The ALTO server will be based on the industry standard, IF-MAP (interface to metadata access points protocol), to leverage publish/subscribe capabilities and the flexible schema defi nitions.