Change search
Link to record
Permanent link

Direct link
BETA
Alternative names
Publications (10 of 103) Show all publications
Kusetogullari, H., Yavariabdi, A., Cheddad, A., Grahn, H. & Johan, H. (2019). ARDIS: A Swedish Historical Handwritten Digit Dataset. Neural computing & applications (Print)
Open this publication in new window or tab >>ARDIS: A Swedish Historical Handwritten Digit Dataset
Show others...
2019 (English)In: Neural computing & applications (Print), ISSN 0941-0643, E-ISSN 1433-3058Article in journal (Refereed) Epub ahead of print
Abstract [en]

This paper introduces a new image-based handwrittenhistorical digit dataset named ARDIS (Arkiv DigitalSweden). The images in ARDIS dataset are extractedfrom 15,000 Swedish church records which were writtenby different priests with various handwriting styles in thenineteenth and twentieth centuries. The constructed datasetconsists of three single digit datasets and one digit stringsdataset. The digit strings dataset includes 10,000 samplesin Red-Green-Blue (RGB) color space, whereas, the otherdatasets contain 7,600 single digit images in different colorspaces. An extensive analysis of machine learning methodson several digit datasets is examined. Additionally, correlationbetween ARDIS and existing digit datasets ModifiedNational Institute of Standards and Technology (MNIST)and United States Postal Service (USPS) is investigated. Experimental results show that machine learning algorithms,including deep learning methods, provide low recognitionaccuracy as they face difficulties when trained on existingdatasets and tested on ARDIS dataset. Accordingly, ConvolutionalNeural Network (CNN) trained on MNIST andUSPS and tested on ARDIS provide the highest accuracies 58.80% and 35.44%, respectively. Consequently, the resultsreveal that machine learning methods trained on existingdatasets can have difficulties to recognize digits effectivelyon our dataset which proves that ARDIS dataset hasunique characteristics. This dataset is publicly available forthe research community to further advance handwritten digitrecognition algorithms.

Place, publisher, year, edition, pages
Springer Nature Switzerland, 2019
Keywords
Handwritten digit recognition, ARDIS dataset, Machine learning methods, Benchmark
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:bth-17741 (URN)10.1007/s00521-019-04163-3 (DOI)
Funder
Knowledge Foundation, 20140032
Available from: 2019-03-27 Created: 2019-03-27 Last updated: 2019-05-02Bibliographically approved
Sidorova, Y., Rosander, O., Sköld, L., Grahn, H. & Lundberg, L. (2019). Finding a healthy equilibrium of geo-demographic segments for a telecom business: Who are malicious hot-spotters?. In: George A. Tsihrintzis, Dionisios N. Sotiropoulos, Lakhmi C. Jain (Ed.), Machine Learning Paradigms: Advances in Data Analytics (pp. 187-196). Springer Science and Business Media Deutschland GmbH
Open this publication in new window or tab >>Finding a healthy equilibrium of geo-demographic segments for a telecom business: Who are malicious hot-spotters?
Show others...
2019 (English)In: Machine Learning Paradigms: Advances in Data Analytics / [ed] George A. Tsihrintzis, Dionisios N. Sotiropoulos, Lakhmi C. Jain, Springer Science and Business Media Deutschland GmbH , 2019, p. 187-196Chapter in book (Refereed)
Abstract [en]

In telecommunication business, a major investment goes into the infrastructure and its maintenance, while business revenues are proportional to how big, good, and well-balanced the customer base is. In our previous work we presented a data-driven analytic strategy based on combinatorial optimization and analysis of the historical mobility designed to quantify the desirability of different geo-demographic segments, and several segments were recommended for a partial reduction. Within a segment, clients are different. In order to enable intelligent reduction, we introduce the term infrastructure-stressing client and, using the proposed method, we reveal the list of the IDs of such clients. We also have developed a visualization tool to allow for manual checks: it shows how the client moved through a sequence of hot spots and was repeatedly served by critically loaded antennas. The code and the footprint matrix are available on the SourceForge. © 2019, Springer International Publishing AG, part of Springer Nature.

Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH, 2019
Series
Intelligent Systems Reference Library, ISSN 1868-4394 ; 149
Keywords
Business intelligence, Combinatorial optimization, Fuzzy logic, Geo-demographic segments, Mobility data, MOSAIC
National Category
Telecommunications Business Administration Computer Sciences
Identifiers
urn:nbn:se:bth-16885 (URN)10.1007/978-3-319-94030-4_8 (DOI)2-s2.0-85049522294 (Scopus ID)978-3-319-94029-8 (ISBN)
Available from: 2018-08-20 Created: 2018-08-20 Last updated: 2018-08-20Bibliographically approved
García Martín, E., Lavesson, N., Grahn, H., Casalicchio, E. & Boeva, V. (2019). How to Measure Energy Consumption in Machine Learning Algorithms. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): ECMLPKDD 2018: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases Workshops. Lecture Notes in Computer Science. Springer, Cham. Paper presented at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2018; Dublin; Ireland; 10 September 2018 through 14 September 2018 (pp. 243-255). , 11329
Open this publication in new window or tab >>How to Measure Energy Consumption in Machine Learning Algorithms
Show others...
2019 (English)In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): ECMLPKDD 2018: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases Workshops. Lecture Notes in Computer Science. Springer, Cham, 2019, Vol. 11329, p. 243-255Conference paper, Published paper (Refereed)
Abstract [en]

Machine learning algorithms are responsible for a significant amount of computations. These computations are increasing with the advancements in different machine learning fields. For example, fields such as deep learning require algorithms to run during weeks consuming vast amounts of energy. While there is a trend in optimizing machine learning algorithms for performance and energy consumption, still there is little knowledge on how to estimate an algorithm’s energy consumption. Currently, a straightforward cross-platform approach to estimate energy consumption for different types of algorithms does not exist. For that reason, well-known researchers in computer architecture have published extensive works on approaches to estimate the energy consumption. This study presents a survey of methods to estimate energy consumption, and maps them to specific machine learning scenarios. Finally, we illustrate our mapping suggestions with a case study, where we measure energy consumption in a big data stream mining scenario. Our ultimate goal is to bridge the current gap that exists to estimate energy consumption in machine learning scenarios.

Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 11329
Keywords
Computer architecture, Energy efficiency, Green computing, Machine learning
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-17209 (URN)10.1007/978-3-030-13453-2_20 (DOI)9783030134525 (ISBN)
Conference
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2018; Dublin; Ireland; 10 September 2018 through 14 September 2018
Funder
Knowledge Foundation, 20140032
Available from: 2018-11-01 Created: 2018-11-01 Last updated: 2019-04-18Bibliographically approved
Westphal, F., Lavesson, N. & Grahn, H. (2018). Document Image Binarization Using Recurrent Neural Networks. In: Proceedings - 13th IAPR International Workshop on Document Analysis Systems, DAS 2018: . Paper presented at 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), vienna (pp. 263-268). IEEE
Open this publication in new window or tab >>Document Image Binarization Using Recurrent Neural Networks
2018 (English)In: Proceedings - 13th IAPR International Workshop on Document Analysis Systems, DAS 2018, IEEE, 2018, p. 263-268Conference paper, Published paper (Refereed)
Abstract [en]

In the context of document image analysis, image binarization is an important preprocessing step for other document analysis algorithms, but also relevant on its own by improving the readability of images of historical documents. While historical document image binarization is challenging due to common image degradations, such as bleedthrough, faded ink or stains, achieving good binarization performance in a timely manner is a worthwhile goal to facilitate efficient information extraction from historical documents. In this paper, we propose a recurrent neural network based algorithm using Grid Long Short-Term Memory cells for image binarization, as well as a pseudo F-Measure based weighted loss function. We evaluate the binarization and execution performance of our algorithm for different choices of footprint size, scale factor and loss function. Our experiments show a significant trade-off between binarization time and quality for different footprint sizes. However, we see no statistically significant difference when using different scale factors and only limited differences for different loss functions. Lastly, we compare the binarization performance of our approach with the best performing algorithm in the 2016 handwritten document image binarization contest and show that both algorithms perform equally well.

Place, publisher, year, edition, pages
IEEE, 2018
Keywords
image binarization, recurrent neural networks, Grid LSTM, historical documents, Text analysis, Labeling, Recurrent neural networks, Heuristic algorithms, Training, Degradation, Ink
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:bth-16749 (URN)10.1109/DAS.2018.71 (DOI)000467070300045 ()978-1-5386-3346-5 (ISBN)
Conference
2018 13th IAPR International Workshop on Document Analysis Systems (DAS), vienna
Funder
Knowledge Foundation, 20140032
Available from: 2018-07-06 Created: 2018-07-06 Last updated: 2019-06-28Bibliographically approved
Westphal, F., Grahn, H. & Lavesson, N. (2018). Efficient document image binarization using heterogeneous computing and parameter tuning. International Journal on Document Analysis and Recognition, 21(1-2), 41-58
Open this publication in new window or tab >>Efficient document image binarization using heterogeneous computing and parameter tuning
2018 (English)In: International Journal on Document Analysis and Recognition, ISSN 1433-2833, E-ISSN 1433-2825, Vol. 21, no 1-2, p. 41-58Article in journal (Refereed) Published
Abstract [en]

In the context of historical document analysis, image binarization is a first important step, which separates foreground from background, despite common image degradations, such as faded ink, stains, or bleed-through. Fast binarization has great significance when analyzing vast archives of document images, since even small inefficiencies can quickly accumulate to years of wasted execution time. Therefore, efficient binarization is especially relevant to companies and government institutions, who want to analyze their large collections of document images. The main challenge with this is to speed up the execution performance without affecting the binarization performance. We modify a state-of-the-art binarization algorithm and achieve on average a 3.5 times faster execution performance by correctly mapping this algorithm to a heterogeneous platform, consisting of a CPU and a GPU. Our proposed parameter tuning algorithm additionally improves the execution time for parameter tuning by a factor of 1.7, compared to previous parameter tuning algorithms. We see that for the chosen algorithm, machine learning-based parameter tuning improves the execution performance more than heterogeneous computing, when comparing absolute execution times. © 2018 The Author(s)

Place, publisher, year, edition, pages
Springer Verlag, 2018
Keywords
Automatic parameter tuning, Heterogeneous computing, Historical documents, Image binarization, Bins, History, Image analysis, Learning systems, Document image binarization, Government institutions, Heterogeneous platforms, Parameter tuning algorithm, Parameter estimation
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-15891 (URN)10.1007/s10032-017-0293-7 (DOI)000433193500003 ()2-s2.0-85041228615 (Scopus ID)
Available from: 2018-02-15 Created: 2018-02-15 Last updated: 2018-08-27Bibliographically approved
García Martín, E., Lavesson, N., Grahn, H., Casalicchio, E. & Boeva, V. (2018). Hoeffding Trees with nmin adaptation. In: The 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2018): . Paper presented at 5th IEEE International Conference on Data Science and Advanced Analytics (IEEE DSAA), 1–4 October 2018, Turin (pp. 70-79). IEEE
Open this publication in new window or tab >>Hoeffding Trees with nmin adaptation
Show others...
2018 (English)In: The 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2018), IEEE, 2018, p. 70-79Conference paper, Published paper (Refereed)
Abstract [en]

Machine learning software accounts for a significant amount of energy consumed in data centers. These algorithms are usually optimized towards predictive performance, i.e. accuracy, and scalability. This is the case of data stream mining algorithms. Although these algorithms are adaptive to the incoming data, they have fixed parameters from the beginning of the execution. We have observed that having fixed parameters lead to unnecessary computations, thus making the algorithm energy inefficient.In this paper we present the nmin adaptation method for Hoeffding trees. This method adapts the value of the nmin pa- rameter, which significantly affects the energy consumption of the algorithm. The method reduces unnecessary computations and memory accesses, thus reducing the energy, while the accuracy is only marginally affected. We experimentally compared VFDT (Very Fast Decision Tree, the first Hoeffding tree algorithm) and CVFDT (Concept-adapting VFDT) with the VFDT-nmin (VFDT with nmin adaptation). The results show that VFDT-nmin consumes up to 27% less energy than the standard VFDT, and up to 92% less energy than CVFDT, trading off a few percent of accuracy in a few datasets.

Place, publisher, year, edition, pages
IEEE, 2018
Series
Proceedings of the International Conference on Data Science and Advanced Analytics, ISSN 2472-1573
Keywords
data stream mining; green artificial intelligence; energy efficiency; hoeffding trees; energy aware machine learning
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-17207 (URN)10.1109/DSAA.2018.00017 (DOI)000459238600008 ()978-1-5386-5090-5 (ISBN)
Conference
5th IEEE International Conference on Data Science and Advanced Analytics (IEEE DSAA), 1–4 October 2018, Turin
Funder
Knowledge Foundation, 20140032
Available from: 2018-11-01 Created: 2018-11-01 Last updated: 2019-04-05Bibliographically approved
Nordahl, C., Grahn, H., Persson, M. & Boeva, V. (2018). Organizing, Visualizing and Understanding Households Electricity Consumption Data through Clustering Analysis.. In: Organizing, Visualizing and Understanding Households Electricity Consumption Data through Clustering Analysis: . Paper presented at 2ND WORKSHOP ON AI FOR AGING, REHABILITATION AND INDEPENDENT ASSISTED LIVING (ARIAL) @IJCAI'18, Stockholm. https://sites.google.com/view/arial2018/accepted-papersprogram
Open this publication in new window or tab >>Organizing, Visualizing and Understanding Households Electricity Consumption Data through Clustering Analysis.
2018 (English)In: Organizing, Visualizing and Understanding Households Electricity Consumption Data through Clustering Analysis, https://sites.google.com/view/arial2018/accepted-papersprogram , 2018Conference paper, Published paper (Refereed)
Abstract [en]

We propose a cluster analysis approach for organizing, visualizing and understanding households’ electricity consumption data. We initially partition the consumption data into a number of clusters with similar daily electricity consumption profiles. The centroids of each cluster can be seen as representative signatures of a household’s electricity consumption behaviors. We evaluate the proposed approach by conducting a number of experiments on electricity consumption data of ten selected households. Our results show that the approach is suitable for data analysis, understanding and creating electricity consumption behavior models.

Place, publisher, year, edition, pages
https://sites.google.com/view/arial2018/accepted-papersprogram, 2018
National Category
Other Computer and Information Science
Identifiers
urn:nbn:se:bth-17439 (URN)
Conference
2ND WORKSHOP ON AI FOR AGING, REHABILITATION AND INDEPENDENT ASSISTED LIVING (ARIAL) @IJCAI'18, Stockholm
Projects
BigData@BTH
Available from: 2018-12-19 Created: 2018-12-19 Last updated: 2019-01-16Bibliographically approved
Westphal, F., Grahn, H. & Lavesson, N. (2018). User Feedback and Uncertainty in User Guided Binarization. In: Tong, H; Li, Z; Zhu, F; Yu, J (Ed.), International Conference on Data Mining Workshops: . Paper presented at 18th IEEE International Conference on Data Mining Workshops, ICDMW, Singapore; Singapore; 17 November 2018 through 20 November (pp. 403-410). IEEE Computer Society, Article ID 8637367.
Open this publication in new window or tab >>User Feedback and Uncertainty in User Guided Binarization
2018 (English)In: International Conference on Data Mining Workshops / [ed] Tong, H; Li, Z; Zhu, F; Yu, J, IEEE Computer Society, 2018, p. 403-410, article id 8637367Conference paper, Published paper (Refereed)
Abstract [en]

In a child’s development, the child’s inherent ability to construct knowledge from new information is as important as explicit instructional guidance. Similarly, mechanisms to produce suitable learning representations, which can be trans- ferred and allow integration of new information are important for artificial learning systems. However, equally important are modes of instructional guidance, which allow the system to learn efficiently. Thus, the challenge for efficient learning is to identify suitable guidance strategies together with suitable learning mechanisms.

In this paper, we propose guided machine learning as source for suitable guidance strategies, we distinguish be- tween sample selection based and privileged information based strategies and evaluate three sample selection based strategies on a simple transfer learning task. The evaluated strategies are random sample selection, i.e., supervised learning, user based sample selection based on readability, and user based sample selection based on readability and uncertainty. We show that sampling based on readability and uncertainty tends to produce better learning results than the other two strategies. Furthermore, we evaluate the use of the learner’s uncertainty for self directed learning and find that effects similar to the Dunning-Kruger effect prevent this use case. The learning task in this study is document image binarization, i.e., the separation of text foreground from page background and the source domain of the transfer are texts written on paper in Latin characters, while the target domain are texts written on palm leaves in Balinese script.

Place, publisher, year, edition, pages
IEEE Computer Society, 2018
Keywords
guided machine learning, interactive machine learning, image binarization, historical documents
National Category
Computer Vision and Robotics (Autonomous Systems) Human Computer Interaction
Identifiers
urn:nbn:se:bth-17742 (URN)10.1109/ICDMW.2018.00066 (DOI)000465766800058 ()978-1-5386-9288-2 (ISBN)
Conference
18th IEEE International Conference on Data Mining Workshops, ICDMW, Singapore; Singapore; 17 November 2018 through 20 November
Funder
Knowledge Foundation, 20140032
Note

 "© 20XX IEEE.  Personal use of this material is permitted.  Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Available from: 2019-03-27 Created: 2019-03-27 Last updated: 2019-07-01Bibliographically approved
Martinsen, J. K., Grahn, H. & Isberg, A. (2017). Combining thread-level speculation and just-in-time compilation in Google’s V8 JavaScript engine. Concurrency and Computation, 29(1), Article ID e3826.
Open this publication in new window or tab >>Combining thread-level speculation and just-in-time compilation in Google’s V8 JavaScript engine
2017 (English)In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 29, no 1, article id e3826Article in journal (Refereed) Published
Abstract [en]

Summary: Thread-level speculation can be used to take advantage of multicore architectures for JavaScript in web applications. We extend previous studies with these main contributions; we implement thread-level speculation in the state-of-the art just-in-time-enabled JavaScript engine V8 and make the measurements in the Chromium web browser both from Google instead of using an interpreted JavaScript engine. We evaluate the thread-level speculation and just-in-time compilation combination on 15 very popular web applications, 20 HTML5 demos from the JS1K competition, and 4 Google Maps use cases. The performance is evaluated on two, four, and eight cores. The results clearly show that it is possible to successfully combine thread-level speculation and just-in-time compilation. This makes it possible to take advantage of multicore architectures for web applications while hiding the details of parallel programming from the programmer. Further, our results show an average speedup for the thread-level speculation and just-in-time compilation combination by a factor of almost 3 on four cores and over 4 on eight cores, without changing any of the JavaScript source code.

Place, publisher, year, edition, pages
Wiley Online Library, 2017
Keywords
Computer architecture; Computer programming; Engines; High level languages; Just in time production; Parallel programming; Software architecture; World Wide Web, Javascript; Just in time; Just-in-time compilation; Multicore architectures; Source codes; State of the art; Thread level speculation; WEB application, Multicore programming
National Category
Computer Engineering Computer Sciences
Identifiers
urn:nbn:se:bth-13219 (URN)10.1002/cpe.3826 (DOI)000390562700002 ()2-s2.0-84966359864 (Scopus ID)
Available from: 2016-10-03 Created: 2016-10-03 Last updated: 2018-02-02Bibliographically approved
García Martín, E., Lavesson, N. & Grahn, H. (2017). Energy Efficiency Analysis of the Very Fast Decision Tree Algorithm. In: Rokia Missaoui, Talel Abdessalem, Matthieu Latapy (Ed.), Trends in Social Network Analysis: Information Propagation, User Behavior Modeling, Forecasting, and Vulnerability Assessment (pp. 229-252). Cham, Switzerland: Springer
Open this publication in new window or tab >>Energy Efficiency Analysis of the Very Fast Decision Tree Algorithm
2017 (English)In: Trends in Social Network Analysis: Information Propagation, User Behavior Modeling, Forecasting, and Vulnerability Assessment / [ed] Rokia Missaoui, Talel Abdessalem, Matthieu Latapy, Cham, Switzerland: Springer, 2017, p. 229-252Chapter in book (Refereed)
Abstract [en]

Data mining algorithms are usually designed to optimize a trade-off between predictive accuracy and computational efficiency. This paper introduces energy consumption and energy efficiency as important factors to consider during data mining algorithm analysis and evaluation. We conducted an experiment to illustrate how energy consumption and accuracy are affected when varying the parameters of the Very Fast Decision Tree (VFDT) algorithm. These results are compared with a theoretical analysis on the algorithm, indicating that energy consumption is affected by the parameters design and that it can be reduced significantly while maintaining accuracy.

Place, publisher, year, edition, pages
Cham, Switzerland: Springer, 2017
Series
Lectures Notes in Social Networks, ISSN 2190-5428
Keywords
Energy efficiency, Green computing, Very Fast Decision Tree, Big Data
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-15489 (URN)10.1007/978-3-319-53420-6_10 (DOI)978-3-319-53419-0 (ISBN)978-3-319-53420-6 (ISBN)
Funder
Knowledge Foundation, 20140032
Available from: 2017-11-14 Created: 2017-11-14 Last updated: 2018-02-02Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-9947-1088

Search in DiVA

Show all publications