Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Performance Evaluation of SQL and NoSQL Database Management Systems in a Cluster
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
2017 (English)In: International Journal of Database Management Systems, ISSN 0975-5705, Vol. 9, no 6, p. 1-24Article in journal (Refereed) Published
Abstract [en]

In  this  study,  we  evaluate  the  performance  of  SQL  and  NoSQL  database  management  systems  namely; Cassandra, CouchDB, MongoDB, PostgreSQL, and RethinkDB.  We use a cluster of  four  nodes to run the database  systems,  with  external  load  generators.The  evaluation  is  conducted  using  data  from  Telenor Sverige,  a  telecommunication  company  that  operates in  Sweden.  The  experiments  are  conducted  using three  datasets  of  different  sizes.The  write  throughput  and  latency  as  well  as  the  read  throughput  and latency are evaluated for four queries; namely distance query, k-nearest neighbour query, range query, and region  query.  For  write  operations  Cassandra  has  the  highest  throughput  when  multiple  nodes  are  used, whereas  PostgreSQL  has  the  lowest  latency  and  the  highest  throughput  for  a  single  node.  For  read operations  MongoDB  has  the  lowest  latency  for  all  queries.  However,  Cassandra  has  the  highest throughput  for  reads.  The  throughput  decreasesas  the  dataset  size  increases  for  both  write  and  read,  for both  sequential  as  well  as  random  order  access.  However,  this  decrease  is  more  significant  for  random read and write. In this study, we present the experience we had with these different database management systems including setup and configuration complexity.

Place, publisher, year, edition, pages
Academy and Industry Research Collaboration Center (AIRCC) , 2017. Vol. 9, no 6, p. 1-24
Keywords [en]
Trajectory queries, cluster computing, SQL database
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:bth-15754DOI: 10.5121/ijdms.2017.9601OAI: oai:DiVA.org:bth-15754DiVA, id: diva2:1173839
Funder
Knowledge Foundation, 20140032Available from: 2018-01-14 Created: 2018-01-14 Last updated: 2024-04-10Bibliographically approved
In thesis
1. Performance Aspects of Databases and Virtualized Real-time Applications
Open this publication in new window or tab >>Performance Aspects of Databases and Virtualized Real-time Applications
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Context: High computing system performance depends on the interaction between software and hardware layers in modern computer systems. Two strong trends that effect different layers in computer systems are that single processors are now more or less completely replaced by multiprocessors, which are often organized into clusters, and virtualization of resources. The performance evaluation of different software on such physical and virtualized resources, is the focus of this thesis.

Objectives: The objectives of this thesis are to investigate the performance evaluation of SQL and No SQL database management systems, namely Cassandra, CouchDB, MongoDB, PostgreSQL, and RethinkDB; and soft real-time application namely, voice-driven web. Scheduling algorithms for resource allocation for hard real-time applications on virtual processor are also investigated.

Methods: Experiment is used to measure the performance of SQL and No SQL management systems on cluster. It is also used to develop a prototype and predicts processor performance of voice-driven web on multiprocessors. Theoretical methods are used to model and design algorithms to schedule real-time applications on the virtual processor machine. Simulation is used to quantify the performance implications of certain parameter values in our theoretical results and to compare expected performance with theoretical bounds in our schedulability tests.

Results:The performance of Cassandra, CouchDB, MongoDB, 2

PostgreSQL, and RethinkDB is evaluated in terms of writing and reading throughput and latencies in cluster computing. For reading throughput, all database systems are horizontally scalable as the cluster’s nodes number increases, however, only Cassandra and couchDB exhibit scalability for data writing. The overall evaluation shows that Cassandra has the most writing scalable throughput as the number of nodes increases with a relative low latency, whereas PostgreSQL has the lowest writing latency, and MongoDB has the lowest reading latency.

The architectures’ tradeoffs of voice-driven web show that the voice engine should be installed on the server instead of being on the mobile device, and performance evaluations show that speech engine scales with respect to the number of cores in the multiprocessor with and without hyperthreading.

The thesis presents scheduling techniques for real-time applications that runs in virtual machines which are time sharing the processor. Each virtual machine’s period and execution time that allow real-time applications to meet their deadlines can be defined using these techniques. Simulation results show the impact of the length of different VM periods with respect to overhead. The tradeoffs between resources consumption and period length are also given. Furthermore, a utilization based test for scheduling real-time application on virtual multiprocessor is presented. This test determines if a task set is schedulable or not. If the task set is schedulable the algorithm provides the priority for each task. This algorithm avoids Dhall’s effect, which may cause task sets with even very low utilization to miss deadlines.

Conclusions: The thesis presented the performance evaluation of reading and writing throughput and latencies for SQL and NoSQL management systems in the cluster computing. The thesis quantifies the tradeoffs of voice-driven web architectures and the performance scalability of the speech engine with respect to number of cores of the multiprocessor. Furthermore, this thesis proposes scheduling algorithms for real-time 3

application with hard deadline on virtual processors, either as a single core processor or as a multicore processor.

Place, publisher, year, edition, pages
Karlskrona: Blekinge Tekniska Högskola, 2018. p. 228
Series
Blekinge Institute of Technology Doctoral Dissertation Series, ISSN 1653-2090 ; 02
Keywords
SQL and NoSQL database, Bigdata management systems, Structured and non-structured Database Evaluation, Voice-driven web, Multicore performance prediction, Hard real-time Scheduling, Virtual Multiprocessor Scheduling
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-15758 (URN)978-91-7295-348-2 (ISBN)
Public defence
2018-09-21, J1650, Blekinge Tekniska Högskola, 371 79 Karlskrona, Karlskrona, 13:00 (English)
Opponent
Supervisors
Funder
Sida - Swedish International Development Cooperation Agency
Available from: 2018-01-15 Created: 2018-01-14 Last updated: 2019-09-13Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text
By organisation
Department of Computer Science and Engineering
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 10546 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf