Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluation of Idempotency & Block Size of Data on the Performance of Normalized Compression Distance Algorithm
Blekinge Institute of Technology, School of Computing.
Blekinge Institute of Technology, School of Computing.
2012 (English)Independent thesis Advanced level (degree of Master (Two Years))Student thesis
Abstract [en]

Normalized compression distance (NCD) is a similarity distance metric algorithm which is used for the purpose of analyzing the type of file fragments. The performance of NCD depends upon underlying compression algorithm to be used. We have studied three compressors bzip2, gzip and ppmd, the compression ratio of ppmd is better than bzip2 and the compression ratio of bzip2 is better than gzip, but which one out of these three is better than one another in the viewpoint of idempotency is evaluated by us. Then we have applied NCD along with k nearest neighbour as a classification algorithm to a randomly selected public corpus data with different block sizes (512 byte, 1024 bytes, 1536 bytes, 2048 bytes). The performance of two compressors bzip2 and gzip is also compared for the NCD algorithm in the perspective of idempotency. Objectives: In this study we have investigated the In this study we have investigated the combine effect of both of the parameters namely compression ratio versus idempotency and varying block size of data on the performance of NCD. The objective is to figure out that in order to have a better performance of NCD either a compressor for NCD should be selected on the basis of better compression ratio of compressors or better idempotency of compressors. The whole purpose of using different block sizes was to evaluate either the performance of NCD will improve or not by varying the block size of data to be used for making the datasets. Methods: Experiments are performed to test the hypotheses and evaluate the effect of compression ratio versus idempotency and block size of data on the performance of NCD. Results: The results obtained after the analysis of null hypotheses of main experiment are retained, which showed that there is no statistically significant difference on the performance of NCD when varying block size of data is used and also there is no statistically significant difference on the NCD’s performance when a compressor is selected for NCD on the basis of better compression ratio or better idempotency. Conclusions: As the results obtained from the experiments are unable to reject the null hypotheses of main experiment so no conclusion could be drawn of the effect of the independent variables on the dependent variable i.e. there is no statistically significant effect of compression ratio versus idempotency and varying block size of data on performance of the NCD.

Place, publisher, year, edition, pages
2012. , 62 p.
Keyword [en]
Normalized Compression Distance, NCD, Idempotency, Compression ratio, Block Size of Data.
National Category
Information Systems Computer Science
Identifiers
URN: urn:nbn:se:bth-4303Local ID: oai:bth.se:arkivex9C66E84920C68FF9C1257AC300480137OAI: oai:DiVA.org:bth-4303DiVA: diva2:831635
Uppsok
Technology
Supervisors
Available from: 2015-04-22 Created: 2012-11-27 Last updated: 2015-06-30Bibliographically approved

Open Access in DiVA

fulltext(1901 kB)335 downloads
File information
File name FULLTEXT01.pdfFile size 1901 kBChecksum SHA-512
e220e9d9713369f29fbbe641957eaaedee640cc9e040e1e9cc7c7485fc85ce24049cdf1ede641518c43a41112588e75f5ece690793ae893801737880f21992fa
Type fulltextMimetype application/pdf

By organisation
School of Computing
Information SystemsComputer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 335 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 82 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf