Cache Support in a High Performance Fault-Tolerant Distributed Storage System for Cloud and Big Data
2015 (English)In: 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IEEE Computer Society, 2015, 537-546 p.Conference paper (Refereed)
Due to the trends towards Big Data and Cloud Computing, one would like to provide large storage systems that are accessible by many servers. A shared storage can, however, become a performance bottleneck and a single-point of failure. Distributed storage systems provide a shared storage to the outside world, but internally they consist of a network of servers and disks, thus avoiding the performance bottleneck and single-point of failure problems. We introduce a cache in a distributed storage system. The cache system must be fault tolerant so that no data is lost in case of a hardware failure. This requirement excludes the use of the common write-invalidate cache consistency protocols. The cache is implemented and evaluated in two steps. The first step focuses on design decisions that improve the performance when only one server uses the same file. In the second step we extend the cache with features that focus on the case when more than one server access the same file. The cache improves the throughput significantly compared to having no cache. The two-step evaluation approach makes it possible to quantify how different design decisions affect the performance of different use cases.
Place, publisher, year, edition, pages
IEEE Computer Society, 2015. 537-546 p.
big data; cloud; distributed storage systems; cache; performance evaluation
IdentifiersURN: urn:nbn:se:bth-11411DOI: 10.1109/IPDPSW.2015.65ISI: 000380446100062ISBN: 978-1-4673-9739-1OAI: oai:DiVA.org:bth-11411DiVA: diva2:894241
IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW), Hyderabad
ProjectsBigdata@BTH - Scalable resource-efficient systems for big data analytics