Benchmarking Hadoop performance on different distributed storage systems

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu | Master's thesis
Date
2015-08-24
Department
Major/Subject
Mobile Computing
Mcode
T-110
Degree programme
Master's Programme in Mobile Computing - Services and Security
Language
en
Pages
99+11
Series
Abstract
Distributed storage systems have been in place for years, and have undergone significant changes in architecture to ensure reliable storage of data in a cost-effective manner. With the demand for data increasing, there has been a shift from disk-centric to memory-centric computing - the focus is on saving data in memory rather than on the disk. The primary motivation for this is the increased speed of data processing. This could, however, mean a change in the approach to providing the necessary fault-tolerance - instead of data replication, other techniques may be considered. One example of an in-memory distributed storage system is Tachyon. Instead of replicating data files in memory, Tachyon provides fault-tolerance by maintaining a record of the operations needed to generate the data files. These operations are replayed if the files are lost. This approach is termed lineage. Tachyon is already deployed by many well-known companies. This thesis work compares the storage performance of Tachyon with that of the on-disk storage systems HDFS and Ceph. After studying the architectures of well-known distributed storage systems, the major contribution of the work is to integrate Tachyon with Ceph as an underlayer storage system, and understand how this affects its performance, and how to tune Tachyon to extract maximum performance out of it.
Description
Supervisor
Heljanko, Keijo
Thesis advisor
Döngelci, Ridvan
Keywords
Tachyon, HDFS, Ceph, benchmarks
Other note
Citation