Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Replacing batch-based data extraction withevent streaming with Apache Kafka: A comparative study
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.
2022 (English)Independent thesis Basic level (university diploma), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

For growing organisations that have built their data flow around a monolithic database server, anever-increasing number of applications and an ever-increasing demand for data freshness willeventually push the existing system to its limits, prompting either hardware upgrades or anupdated data architecture. Switching from an approach of full extractions of data at regularintervals to an approach where only changes are extracted, resource consumption couldpotentially be decreased, while simultaneously increasing data freshness. The objective of this thesis is to provide insights into how implementing an event streamingsetup with Apache Kafka connected to SQL Server through the Debezium source connectoraffects resource consumption on the database server. Other studies in related work have oftenbeen focused on steps further downstream in the data pipeline. This thesis can thereforecontribute to an area where more knowledge is needed. Through an empirical study done using two different setups in the same system, traditional dataextraction in batches and extraction through event streaming is measured and compared. The point of measurement is the SQL Server database from which data is extracted. Both memoryutilisation and CPU utilisation is measured, using SQL Server Profiler. Different parameters fortable sizes, volumes of data and intervals between changes are used to simulate differentscenarios. One of the takeaways of the results is that, at the same number of total changes, the size of theindividual transactions has a large impact on the resource consumption caused by eventstreaming. The study shows that an overhead cost is involved with each transaction, and also thatthe regular polling that the source connector performs causes resource consumption even inidleness. The thesis concludes that event streaming can offer reduced resource consumption on thedatabase server. However, when the source table size is small, and the number of changes large,extraction in batches is less resource-intensive.

Place, publisher, year, edition, pages
2022. , p. 66
Keywords [en]
etl, kafka connect, resource consumption, sql server
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:bth-23671OAI: oai:DiVA.org:bth-23671DiVA, id: diva2:1696556
Subject / course
PA1438 Självständigt arbete Webbprogrammering
Educational program
PAGWG Webbprogrammering
Supervisors
Examiners
Available from: 2022-10-13 Created: 2022-09-17 Last updated: 2022-10-13Bibliographically approved

Open Access in DiVA

Replacing batch-based data extraction with event streaming with Apache Kafka — a comparative study(603 kB)264 downloads
File information
File name FULLTEXT01.pdfFile size 603 kBChecksum SHA-512
dbfbf21637a0347c0a44998ff7986914668e0af9380760c14bc5a1aa98ee5d40ddc0b78c804fb206ea42ace91d3ba0d2b5d3d356c09f02a181ea44ae7e1972d1
Type fulltextMimetype application/pdf

By organisation
Department of Software Engineering
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 264 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 460 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf