Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Implementation of the HadoopMapReduce algorithm on virtualizedshared storage systems
(Computer Science)
2016 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Context Hadoop is an open-source software framework developed for distributed storage and distributed processing of large sets of data. The implementation of the Hadoop MapReduce algorithm on virtualized shared storage by eliminating the concept of Hadoop Distributed File System (HDFS) is a challenging task. In this study, the Hadoop MapReduce algorithm is implemented on the Compuverde software that deals with virtualized shared storage of data.

Objectives In this study, the effect of using virtualized shared storage with Hadoop framework is identified. The main objective of this study is to design a method to implement the Hadoop MapReduce algorithm on Compuverde software that deals with virtualized shared storage of big data. Finally, the performance of the MapReduce algorithm on Compuverde shared storage (Compuverde File System - CVFS) is evaluated and compared to the performance of the MapReduce algorithm on HDFS.

Methods Initially a literature study is conducted to identify the effect of Hadoop implementation on virtualized shared storage. The Compuverde software is analyzed in detail during this literature study. The concepts of the MapReduce algorithms and the functioning of HDFS are scrutinized in detail. The next main research method that is adapted for this study is the implementation of a method where the Hadoop MapReduce algorithm is applied on the Compuverde software that deals with the virtualized shared storage by eliminating the HDFS. The next step is experimentation in which the performance of the implementation of the MapReduce algorithm on Compuverde shared storage (CVFS) in comparison with implementation of the MapReduce algorithm on Hadoop Distributed File System.

Results The experiment is conducted in two different scenarios namely the CPU bound scenario and I/O bound scenario. In CPU bound scenario, the average execution time of WordCount program has a linear growth with respect to size of data set. This linear growth is observed for both the file systems, HDFS and CVFS. The same is the case with I/O bound scenario. There is linear growth for both the file systems. When the averages of execution time are plotted on the graph, both the file systems perform similarly in CPU bound scenario(multi-node environment). In the I/O bound scenario (multi-node environment), HDFS slightly out performs CVFS when the size of 1.0GB and both the file systems performs without much difference when the size of data set is 0.5GB and 1.5GB.

Conclusions The MapReduce algorithm can be implemented on live data present in the virtualized shared storage systems without copying data into HDFS. In single node environment, distributed storage systems perform better than shared storage systems. In multi-node environment, when the CPU bound scenario is considered, both HDFS and CVFS file systems perform similarly. On the other hand, HDFS performs slightly better than CVFS for 1.0GB of data set in the I/O bound scenario. Hence we can conclude that distributed storage systems perform similar to the shared storage systems in both CPU bound and I/O bound scenarios in multi-node environment.

Place, publisher, year, edition, pages
2016. , p. 35
Series
Blekinge Tekniska Högskola Forskningsrapport, ISSN 1103-1581
Keywords [en]
Hadoop, virtualized systems, shared storage, MapReduce, Hadoop Distributed File System
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-11876OAI: oai:DiVA.org:bth-11876DiVA, id: diva2:926050
External cooperation
Compuverde, Karlskrona
Subject / course
DV2566 Master's Thesis (120 credits) in Computer Science
Educational program
DVACS Master of Science Programme in Computer Science
Presentation
2016-01-25, J1640, Blekinge Institute of Technology, Karlskrona, 09:00 (English)
Supervisors
Examiners
Available from: 2016-05-04 Created: 2016-05-03 Last updated: 2018-01-10Bibliographically approved

Open Access in DiVA

fulltext(1352 kB)690 downloads
File information
File name FULLTEXT02.pdfFile size 1352 kBChecksum SHA-512
c248164a847689d17d6dde4ab91a379f5ee52b1c0e869bdb7d007c025a2ad6624bc6f3e4908793c46869a7a3a54cb20c636097509e02e1b61c5c8af5a29748df
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Nethula, Shravya
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 690 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 443 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf