An Evaluation of Key-value Stores in Scientific Applications
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Big data analytics is a rapidly evolving multidisciplinary field that involves the use of computing capacity, tools, techniques, and theories to solve scientific and engineering problems. With the big data boom, scientific applications now have to analyze huge volumes of data. NoSQL databases are gaining popularity for these type of applications due to their scalability and flexibility. There are various types of NoSQL databases available in the market today, including key-value databases. Key-value databases are the simplest NoSQL databases where every single item is stored as a key-value pair. In-memory key-value stores are specialized key-value databases that maintain data in main memory instead of the disk. Hence, they are well-suited for applications having high-frequencies of alternating read and write cycles.
The focus of this thesis is to analyze popular in-memory key-value stores and compare their performance. We have performed the comparisons based on parameters like in-memory caching support, supported programming languages, scalability, and utilization from parallel applications. Based on the initial comparisons, we evaluated two key-value stores in detail, namely Memcached and Redis. To perform extensive analysis of these two data stores, a set of micro-benchmarks have been developed and evaluated for both Memcached and Redis. Tests were performed to evaluate the scalability, responsiveness and data load handling capacity and Redis outperformed Memcached in all test cases.
To further analyze the in-memory caching ability of Redis, we integrated it as a caching layer into an air quality simulation based on Hadoop MapReduce which calculates the eight-hour rolling average of ozone concentration at various sites in Houston, TX. Our aim was to compare the performance of the original air-quality application that uses the disk for data storage, to our application that uses in-memory caching. Initial results show that there is no performance gain achieved by integrating Redis as a caching layer. Further optimizations and configurations of the code is reserved for future work.