Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Extraction and Energy Efficient Processing of Streaming Data
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering. (Department of Computer Science and Engineering)
2017 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

The interest in machine learning algorithms is increasing, in parallel with the advancements in hardware and software required to mine large-scale datasets. Machine learning algorithms account for a significant amount of energy consumed in data centers, which impacts the global energy consumption. However, machine learning algorithms are optimized towards predictive performance and scalability. Algorithms with low energy consumption are necessary for embedded systems and other resource constrained devices; and desirable for platforms that require many computations, such as data centers. Data stream mining investigates how to process potentially infinite streams of data without the need to store all the data. This ability is particularly useful for companies that are generating data at a high rate, such as social networks.

This thesis investigates algorithms in the data stream mining domain from an energy efficiency perspective. The thesis comprises of two parts. The first part explores how to extract and analyze data from Twitter, with a pilot study that investigates a correlation between hashtags and followers. The second and main part investigates how energy is consumed and optimized in an online learning algorithm, suitable for data stream mining tasks.

The second part of the thesis focuses on analyzing, understanding, and reformulating the Very Fast Decision Tree (VFDT) algorithm, the original Hoeffding tree algorithm, into an energy efficient version. It presents three key contributions. First, it shows how energy varies in the VFDT from a high-level view by tuning different parameters. Second, it presents a methodology to identify energy bottlenecks in machine learning algorithms, by portraying the functions of the VFDT that consume the largest amount of energy. Third, it introduces dynamic parameter adaptation for Hoeffding trees, a method to dynamically adapt the parameters of Hoeffding trees to reduce their energy consumption. The results show an average energy reduction of 23% on the VFDT algorithm.

Place, publisher, year, edition, pages
Karlskrona: Blekinge Tekniska Högskola, 2017.
Series
Blekinge Institute of Technology Licentiate Dissertation Series, ISSN 1650-2140 ; 3
Keyword [en]
machine learning, green computing, data mining, data stream mining, green machine learning
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-15532OAI: oai:DiVA.org:bth-15532DiVA: diva2:1159312
Presentation
2017-12-18, J1640, Blekinge Tekniska Högskola, 371 79, Karlskrona, 13:00 (English)
Opponent
Supervisors
Projects
Scalable resource-efficient systems for big data analytics
Funder
Knowledge Foundation, 20140032
Available from: 2017-11-22 Created: 2017-11-22 Last updated: 2017-11-22Bibliographically approved
List of papers
1. Hashtags and followers: An experimental study of the online social network Twitter
Open this publication in new window or tab >>Hashtags and followers: An experimental study of the online social network Twitter
2016 (English)In: SOCIAL NETWORK ANALYSIS AND MINING, ISSN 1869-5450, Vol. 6, no 1, UNSP 12Article in journal (Refereed) Published
Abstract [en]

We have conducted an analysis of data from 502,891 Twitter users and focused on investigating the potential correlation between hashtags and the increase of followers to determine whether the addition of hashtags to tweets produces new followers. We have designed an experiment with two groups of users: one tweeting with random hashtags and one tweeting without hashtags. The results showed that there is a correlation between hashtags and followers: on average, users tweeting with hashtags increased their followers by 2.88, while users tweeting without hashtags increased 0.88 followers. We present a simple, reproducible approach to extract and analyze Twitter user data for this and similar purposes.

Place, publisher, year, edition, pages
Springer, 2016
Keyword
Experimental study, Correlational analysis, Hashtags, Followers
National Category
Media and Communication Technology Other Computer and Information Science
Identifiers
urn:nbn:se:bth-13048 (URN)10.1007/s13278-016-0320-6 (DOI)000381220500012 ()
Available from: 2016-09-30 Created: 2016-09-30 Last updated: 2017-11-28Bibliographically approved
2. Energy Efficiency Analysis of the Very Fast Decision Tree Algorithm
Open this publication in new window or tab >>Energy Efficiency Analysis of the Very Fast Decision Tree Algorithm
2017 (English)In: Trends in Social Network Analysis: Information Propagation, User Behavior Modeling, Forecasting, and Vulnerability Assessment / [ed] Rokia Missaoui, Talel Abdessalem, Matthieu Latapy, Cham, Switzerland: Springer, 2017, 229-252 p.Chapter in book (Refereed)
Abstract [en]

Data mining algorithms are usually designed to optimize a trade-off between predictive accuracy and computational efficiency. This paper introduces energy consumption and energy efficiency as important factors to consider during data mining algorithm analysis and evaluation. We conducted an experiment to illustrate how energy consumption and accuracy are affected when varying the parameters of the Very Fast Decision Tree (VFDT) algorithm. These results are compared with a theoretical analysis on the algorithm, indicating that energy consumption is affected by the parameters design and that it can be reduced significantly while maintaining accuracy.

Place, publisher, year, edition, pages
Cham, Switzerland: Springer, 2017
Series
Lectures Notes in Social Networks, ISSN 2190-5428
Keyword
Energy efficiency, Green computing, Very Fast Decision Tree, Big Data
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-15489 (URN)10.1007/978-3-319-53420-6_10 (DOI)978-3-319-53419-0 (ISBN)978-3-319-53420-6 (ISBN)
Funder
Knowledge Foundation, 20140032
Available from: 2017-11-14 Created: 2017-11-14 Last updated: 2017-11-27Bibliographically approved
3. Identification of Energy Hotspots: A Case Study of the Very Fast Decision Tree
Open this publication in new window or tab >>Identification of Energy Hotspots: A Case Study of the Very Fast Decision Tree
2017 (English)In: GPC 2017: Green, Pervasive, and Cloud Computing / [ed] Au M., Castiglione A., Choo KK., Palmieri F., Li KC., Cham, Switzerland: Springer, 2017, Vol. 10232, 267-281 p.Conference paper, Published paper (Refereed)
Abstract [en]

Large-scale data centers account for a significant share of the energy consumption in many countries. Machine learning technology requires intensive workloads and thus drives requirements for lots of power and cooling capacity in data centers. It is time to explore green machine learning. The aim of this paper is to profile a machine learning algorithm with respect to its energy consumption and to determine the causes behind this consumption. The first scalable machine learning algorithm able to handle large volumes of streaming data is the Very Fast Decision Tree (VFDT), which outputs competitive results in comparison to algorithms that analyze data from static datasets. Our objectives are to: (i) establish a methodology that profiles the energy consumption of decision trees at the function level, (ii) apply this methodology in an experiment to obtain the energy consumption of the VFDT, (iii) conduct a fine-grained analysis of the functions that consume most of the energy, providing an understanding of that consumption, (iv) analyze how different parameter settings can significantly reduce the energy consumption. The results show that by addressing the most energy intensive part of the VFDT, the energy consumption can be reduced up to a 74.3%.

Place, publisher, year, edition, pages
Cham, Switzerland: Springer, 2017
Series
Lecture Notes in Computer Science
Keyword
Machine learning, Big data, Very Fast Decision Tree, Green machine learning, Data mining, Data stream mining
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-15490 (URN)10.1007/978-3-319-57186-7_21 (DOI)978-3-319-57185-0 (ISBN)978-3-319-57186-7 (ISBN)
Conference
International Conference on Green, Pervasive and Cloud Computing (GPC), Cetara, Amalfi Coast, Italy
Funder
Knowledge Foundation, 20140032
Available from: 2017-11-14 Created: 2017-11-14 Last updated: 2017-11-27Bibliographically approved
4. Hoeffding Trees with nmin adaptation
Open this publication in new window or tab >>Hoeffding Trees with nmin adaptation
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Machine learning software accounts for a significant amount of energy consumed in data centers. These algorithms are usually optimized towards predictive performance, i.e. accuracy, and scalability. This is the case of data stream mining algorithms. Although these algorithms are adaptive to the incoming data, they have fixed parameters from the beginning of the execution, which lead to energy hotspots. We present dynamic parameter adaptation for data stream mining algorithms to trade-off energy efficiency against accuracy during runtime. To validate this approach, we introduce the nmin adaptation method to improve parameter adaptation in Hoeffding trees. This method dynamically adapts the number of instances needed to make a split (nmin) and thereby reduces the overall energy consumption. We created an experiment to compare the Very Fast Decision Tree algorithm (VFDT, original Hoeffding tree algorithm) with nmin adaptation and the standard VFDT. The results show that VFDT with nmin adaptation consumes up to 89% less energy than the standard VFDT, trading off a few percent of accuracy. Our approach can be used to trade off energy consumption with predictive and computational performance in the strive towards resource-aware machine learning. 

Keyword
Hoeffding trees, data stream mining, green computing, green machine learning, energy efficiency
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-15493 (URN)
Funder
Knowledge Foundation, 20140032
Available from: 2017-11-14 Created: 2017-11-14 Last updated: 2017-11-22Bibliographically approved

Open Access in DiVA

Licentiate_Eva_Garcia_Martin(5244 kB)99 downloads
File information
File name FULLTEXT01.pdfFile size 5244 kBChecksum SHA-512
634cac683885d56b8c0d493476988aa2ab346adb5013d294d56e7397a9acdcecb66355fdaff5b11bc1a22492d090045472f976bb65f35e796ec9156934b3f324
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
García-Martín, Eva
By organisation
Department of Computer Science and Engineering
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 99 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 263 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf