Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Improving Efficiency of Data Compaction by Creating & Evaluating a Random Compaction Strategy in Apache Cassandra
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.
2020 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Background: Cassandra is a NoSQL database, where the data in the background is stored in the immutable tables which are called SSTables. These SSTables are subjected to a method called Compaction to reclaim the disk space and to improve READ performance. Size Tiered Compaction Strategy and Leveled Compaction Strategy are the most used generic compaction strategies for different use cases. Space Amplification and Write Amplification are the main limitations of the above compaction strategies, respectively. This research aims to address the limitations of existing generic compaction strategies.

Objectives: A new random compaction strategy will be created to improve the efficiency and effectiveness of compaction. This newly created random compaction strategy will be evaluated by comparing the read, write and space amplification with the existing generic compaction strategies, for different use cases.

Methods: In this study, Design Science has been used as a research method to answer both the research questions. Focus groups meetings have been conducted to gain knowledge on the limitations of existing compaction strategies, newly created random compaction strategy, and it’s appropriate solutions. During the evaluation, The metrics have been collected from Prometheus server and visualization is carried out in Grafana server. The compaction strategies are compared significantly by performing statistical tests.

Results: The results in this study showed that the random compaction strategy is performing almost similar to Leveled Compaction Strategy. The Random Compaction Strategy solves the space amplification problem and write amplification problem in the Size Tiered Compaction Strategy and Leveled Compaction Strategy, respectively. In this section, eight important metrics have been analyzed for all three compaction strategies.

Conclusions: The main artefact of this research is a new Random Compaction Strategy. After performing two iterations, a new stable random compaction strategy is designed. The results were analyzed by comparing the Size Tiered Compaction Strategy, Leveled Compaction Strategy and Random Compaction Strategy on two different use cases. The new random compaction strategy has performed great for Ericsson buffer management use case.

Place, publisher, year, edition, pages
2020. , p. 93
Keywords [en]
Apache Cassandra, Compaction Strategy, Random Compaction, NoSQL, Design Science.
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:bth-20182OAI: oai:DiVA.org:bth-20182DiVA, id: diva2:1453076
External cooperation
Ericsson, Karlskrona
Subject / course
PA2534 Master's Thesis (120 credits) in Software Engineering
Educational program
PAAPT Master of Science Programme in Software Engineering
Presentation
2020-06-02, Online Zoom Meeting, Karlskrona, 10:00 (English)
Supervisors
Examiners
Available from: 2020-07-09 Created: 2020-07-08 Last updated: 2020-07-09Bibliographically approved

Open Access in DiVA

Improving Efficiency of Data Compaction by Creating & Evaluating a Random Compaction Strategy in Apache Cassandra(2938 kB)632 downloads
File information
File name FULLTEXT02.pdfFile size 2938 kBChecksum SHA-512
5c8e44e884cae17d8a4c3f9e30cd2e160baed030de6e5d0a2048805a7006e73a2a1b2d31395c695d8aa2974240f2803f6925aced51a716e9203aa8aef27f9c9b
Type fulltextMimetype application/pdf

By organisation
Department of Software Engineering
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 632 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1234 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf