Benchmarking Hadoop performance on different distributed storage systems

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorDöngelci, Ridvan
dc.contributor.authorMukherjee, Alapan
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.supervisorHeljanko, Keijo
dc.date.accessioned2015-09-18T08:27:44Z
dc.date.available2015-09-18T08:27:44Z
dc.date.issued2015-08-24
dc.description.abstractDistributed storage systems have been in place for years, and have undergone significant changes in architecture to ensure reliable storage of data in a cost-effective manner. With the demand for data increasing, there has been a shift from disk-centric to memory-centric computing - the focus is on saving data in memory rather than on the disk. The primary motivation for this is the increased speed of data processing. This could, however, mean a change in the approach to providing the necessary fault-tolerance - instead of data replication, other techniques may be considered. One example of an in-memory distributed storage system is Tachyon. Instead of replicating data files in memory, Tachyon provides fault-tolerance by maintaining a record of the operations needed to generate the data files. These operations are replayed if the files are lost. This approach is termed lineage. Tachyon is already deployed by many well-known companies. This thesis work compares the storage performance of Tachyon with that of the on-disk storage systems HDFS and Ceph. After studying the architectures of well-known distributed storage systems, the major contribution of the work is to integrate Tachyon with Ceph as an underlayer storage system, and understand how this affects its performance, and how to tune Tachyon to extract maximum performance out of it.en
dc.format.extent99+11
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/17713
dc.identifier.urnURN:NBN:fi:aalto-201509184328
dc.language.isoenen
dc.programmeMaster's Programme in Mobile Computing - Services and Securityfi
dc.programme.majorMobile Computingfi
dc.programme.mcodeT-110fi
dc.rights.accesslevelopenAccess
dc.subject.keywordTachyonen
dc.subject.keywordHDFSen
dc.subject.keywordCephen
dc.subject.keywordbenchmarksen
dc.titleBenchmarking Hadoop performance on different distributed storage systemsen
dc.typeG2 Pro gradu, diplomityöen
dc.type.okmG2 Pro gradu, diplomityö
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
dc.type.publicationmasterThesis
local.aalto.idinssi52047
local.aalto.inssiarchivenr3020
local.aalto.inssilocationP1 Ark Aalto
local.aalto.openaccessyes

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
master_Mukherjee_Alapan_2015.pdf
Size:
1.94 MB
Format:
Adobe Portable Document Format