Change search
Link to record
Permanent link

Direct link
BETA
Alternative names
Publications (10 of 93) Show all publications
Westphal, F., Grahn, H. & Lavesson, N. (2018). Efficient document image binarization using heterogeneous computing and parameter tuning. International Journal on Document Analysis and Recognition, 1-18.
Open this publication in new window or tab >>Efficient document image binarization using heterogeneous computing and parameter tuning
2018 (English)In: International Journal on Document Analysis and Recognition, ISSN 1433-2833, E-ISSN 1433-2825, p. 1-18Article in journal (Refereed) Epub ahead of print
Abstract [en]

In the context of historical document analysis, image binarization is a first important step, which separates foreground from background, despite common image degradations, such as faded ink, stains, or bleed-through. Fast binarization has great significance when analyzing vast archives of document images, since even small inefficiencies can quickly accumulate to years of wasted execution time. Therefore, efficient binarization is especially relevant to companies and government institutions, who want to analyze their large collections of document images. The main challenge with this is to speed up the execution performance without affecting the binarization performance. We modify a state-of-the-art binarization algorithm and achieve on average a 3.5 times faster execution performance by correctly mapping this algorithm to a heterogeneous platform, consisting of a CPU and a GPU. Our proposed parameter tuning algorithm additionally improves the execution time for parameter tuning by a factor of 1.7, compared to previous parameter tuning algorithms. We see that for the chosen algorithm, machine learning-based parameter tuning improves the execution performance more than heterogeneous computing, when comparing absolute execution times. © 2018 The Author(s)

Place, publisher, year, edition, pages
Springer Verlag, 2018
Keyword
Automatic parameter tuning, Heterogeneous computing, Historical documents, Image binarization, Bins, History, Image analysis, Learning systems, Document image binarization, Government institutions, Heterogeneous platforms, Parameter tuning algorithm, Parameter estimation
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-15891 (URN)10.1007/s10032-017-0293-7 (DOI)2-s2.0-85041228615 (Scopus ID)
Available from: 2018-02-15 Created: 2018-02-15 Last updated: 2018-02-15Bibliographically approved
Martinsen, J. K., Grahn, H. & Isberg, A. (2017). Combining thread-level speculation and just-in-time compilation in Google’s V8 JavaScript engine. Concurrency and Computation, 29(1), Article ID e3826.
Open this publication in new window or tab >>Combining thread-level speculation and just-in-time compilation in Google’s V8 JavaScript engine
2017 (English)In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 29, no 1, article id e3826Article in journal (Refereed) Published
Abstract [en]

Summary: Thread-level speculation can be used to take advantage of multicore architectures for JavaScript in web applications. We extend previous studies with these main contributions; we implement thread-level speculation in the state-of-the art just-in-time-enabled JavaScript engine V8 and make the measurements in the Chromium web browser both from Google instead of using an interpreted JavaScript engine. We evaluate the thread-level speculation and just-in-time compilation combination on 15 very popular web applications, 20 HTML5 demos from the JS1K competition, and 4 Google Maps use cases. The performance is evaluated on two, four, and eight cores. The results clearly show that it is possible to successfully combine thread-level speculation and just-in-time compilation. This makes it possible to take advantage of multicore architectures for web applications while hiding the details of parallel programming from the programmer. Further, our results show an average speedup for the thread-level speculation and just-in-time compilation combination by a factor of almost 3 on four cores and over 4 on eight cores, without changing any of the JavaScript source code.

Place, publisher, year, edition, pages
Wiley Online Library, 2017
Keyword
Computer architecture; Computer programming; Engines; High level languages; Just in time production; Parallel programming; Software architecture; World Wide Web, Javascript; Just in time; Just-in-time compilation; Multicore architectures; Source codes; State of the art; Thread level speculation; WEB application, Multicore programming
National Category
Computer Engineering Computer Sciences
Identifiers
urn:nbn:se:bth-13219 (URN)10.1002/cpe.3826 (DOI)000390562700002 ()2-s2.0-84966359864 (Scopus ID)
Available from: 2016-10-03 Created: 2016-10-03 Last updated: 2018-02-02Bibliographically approved
García Martín, E., Lavesson, N. & Grahn, H. (2017). Energy Efficiency Analysis of the Very Fast Decision Tree Algorithm. In: Rokia Missaoui, Talel Abdessalem, Matthieu Latapy (Ed.), Trends in Social Network Analysis: Information Propagation, User Behavior Modeling, Forecasting, and Vulnerability Assessment (pp. 229-252). Cham, Switzerland: Springer.
Open this publication in new window or tab >>Energy Efficiency Analysis of the Very Fast Decision Tree Algorithm
2017 (English)In: Trends in Social Network Analysis: Information Propagation, User Behavior Modeling, Forecasting, and Vulnerability Assessment / [ed] Rokia Missaoui, Talel Abdessalem, Matthieu Latapy, Cham, Switzerland: Springer, 2017, p. 229-252Chapter in book (Refereed)
Abstract [en]

Data mining algorithms are usually designed to optimize a trade-off between predictive accuracy and computational efficiency. This paper introduces energy consumption and energy efficiency as important factors to consider during data mining algorithm analysis and evaluation. We conducted an experiment to illustrate how energy consumption and accuracy are affected when varying the parameters of the Very Fast Decision Tree (VFDT) algorithm. These results are compared with a theoretical analysis on the algorithm, indicating that energy consumption is affected by the parameters design and that it can be reduced significantly while maintaining accuracy.

Place, publisher, year, edition, pages
Cham, Switzerland: Springer, 2017
Series
Lectures Notes in Social Networks, ISSN 2190-5428
Keyword
Energy efficiency, Green computing, Very Fast Decision Tree, Big Data
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-15489 (URN)10.1007/978-3-319-53420-6_10 (DOI)978-3-319-53419-0 (ISBN)978-3-319-53420-6 (ISBN)
Funder
Knowledge Foundation, 20140032
Available from: 2017-11-14 Created: 2017-11-14 Last updated: 2018-02-02Bibliographically approved
García Martín, E., Lavesson, N. & Grahn, H. (2017). Identification of Energy Hotspots: A Case Study of the Very Fast Decision Tree. In: Au M., Castiglione A., Choo KK., Palmieri F., Li KC. (Ed.), GPC 2017: Green, Pervasive, and Cloud Computing: . Paper presented at International Conference on Green, Pervasive and Cloud Computing (GPC), Cetara, Amalfi Coast, Italy (pp. 267-281). Cham, Switzerland: Springer, 10232.
Open this publication in new window or tab >>Identification of Energy Hotspots: A Case Study of the Very Fast Decision Tree
2017 (English)In: GPC 2017: Green, Pervasive, and Cloud Computing / [ed] Au M., Castiglione A., Choo KK., Palmieri F., Li KC., Cham, Switzerland: Springer, 2017, Vol. 10232, p. 267-281Conference paper, Published paper (Refereed)
Abstract [en]

Large-scale data centers account for a significant share of the energy consumption in many countries. Machine learning technology requires intensive workloads and thus drives requirements for lots of power and cooling capacity in data centers. It is time to explore green machine learning. The aim of this paper is to profile a machine learning algorithm with respect to its energy consumption and to determine the causes behind this consumption. The first scalable machine learning algorithm able to handle large volumes of streaming data is the Very Fast Decision Tree (VFDT), which outputs competitive results in comparison to algorithms that analyze data from static datasets. Our objectives are to: (i) establish a methodology that profiles the energy consumption of decision trees at the function level, (ii) apply this methodology in an experiment to obtain the energy consumption of the VFDT, (iii) conduct a fine-grained analysis of the functions that consume most of the energy, providing an understanding of that consumption, (iv) analyze how different parameter settings can significantly reduce the energy consumption. The results show that by addressing the most energy intensive part of the VFDT, the energy consumption can be reduced up to a 74.3%.

Place, publisher, year, edition, pages
Cham, Switzerland: Springer, 2017
Series
Lecture Notes in Computer Science
Keyword
Machine learning, Big data, Very Fast Decision Tree, Green machine learning, Data mining, Data stream mining
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-15490 (URN)10.1007/978-3-319-57186-7_21 (DOI)978-3-319-57185-0 (ISBN)978-3-319-57186-7 (ISBN)
Conference
International Conference on Green, Pervasive and Cloud Computing (GPC), Cetara, Amalfi Coast, Italy
Funder
Knowledge Foundation, 20140032
Available from: 2017-11-14 Created: 2017-11-14 Last updated: 2018-02-02Bibliographically approved
Cheddad, A., Kusetogullari, H. & Grahn, H. (2017). Object recognition using shape growth pattern. In: International Symposium on Image and Signal Processing and Analysis, ISPA: . Paper presented at 10th International Symposium on Image and Signal Processing and Analysis (ISPA), Ljubljana (pp. 47-52). IEEE Computer Society Digital Library, Article ID 8073567.
Open this publication in new window or tab >>Object recognition using shape growth pattern
2017 (English)In: International Symposium on Image and Signal Processing and Analysis, ISPA, IEEE Computer Society Digital Library, 2017, p. 47-52, article id 8073567Conference paper, Published paper (Refereed)
Abstract [en]

This paper proposes a preprocessing stage to augment the bank of features that one can retrieve from binary images to help increase the accuracy of pattern recognition algorithms. To this end, by applying successive dilations to a given shape, we can capture a new dimension of its vital characteristics which we term hereafter: the shape growth pattern (SGP). This work investigates the feasibility of such a notion and also builds upon our prior work on structure preserving dilation using Delaunay triangulation. Experiments on two public data sets are conducted, including comparisons to existing algorithms. We deployed two renowned machine learning methods into the classification process (i.e., convolutional neural network-CNN- and random forests-RF-) since they perform well in pattern recognition tasks. The results show a clear improvement of the proposed approach's classification accuracy (especially for data sets with limited training samples) as well as robustness against noise when compared to existing methods.

Place, publisher, year, edition, pages
IEEE Computer Society Digital Library, 2017
Keyword
Binary image dilations, convolutional neural network, machine learning, pattern recognition, shape growth pattern
National Category
Computer Systems Signal Processing
Identifiers
urn:nbn:se:bth-15416 (URN)10.1109/ISPA.2017.8073567 (DOI)978-1-5090-4011-7 (ISBN)
Conference
10th International Symposium on Image and Signal Processing and Analysis (ISPA), Ljubljana
Projects
Scalable resource efficient systems for big data analytics
Available from: 2017-11-01 Created: 2017-11-01 Last updated: 2018-02-02Bibliographically approved
Abghari, S., García Martín, E., Johansson, C., Lavesson, N. & Grahn, H. (2017). Trend analysis to automatically identify heat program changes. In: Energy Procedia: . Paper presented at 15th International Symposium on District Heating and Cooling (DHC2016), Seoul (pp. 407-415). Elsevier, 116.
Open this publication in new window or tab >>Trend analysis to automatically identify heat program changes
Show others...
2017 (English)In: Energy Procedia, Elsevier, 2017, Vol. 116, p. 407-415Conference paper, Published paper (Refereed)
Abstract [en]

The aim of this study is to improve the monitoring and controlling of heating systems located at customer buildings through the use of a decision support system. To achieve this, the proposed system applies a two-step classifier to detect manual changes of the temperature of the heating system. We apply data from the Swedish company NODA, active in energy optimization and services for energy efficiency, to train and test the suggested system. The decision support system is evaluated through an experiment and the results are validated by experts at NODA. The results show that the decision support system can detect changes within three days after their occurrence and only by considering daily average measurements.

Place, publisher, year, edition, pages
Elsevier, 2017
Series
Energy Procedia, ISSN 1876-6102 ; 116
Keyword
District heating, Trend analysis, Change detection, Smart automated system
National Category
Computer Systems
Identifiers
urn:nbn:se:bth-12894 (URN)10.1016/j.egypro.2017.05.088 (DOI)000406743000039 ()
Conference
15th International Symposium on District Heating and Cooling (DHC2016), Seoul
Projects
BigData@BTH
Funder
Knowledge Foundation, 20140032
Note

Open access

Available from: 2016-09-26 Created: 2016-07-13 Last updated: 2018-02-02Bibliographically approved
Petersson, S., Grahn, H. & Rasmusson, J. (2016). Color demosaicing using structural instability. In: Proceedings - 2016 IEEE International Symposium on Multimedia, ISM 2016: . Paper presented at 18th IEEE International Symposium on Multimedia, (ISM), San José, USA (pp. 541-544). Institute of Electrical and Electronics Engineers (IEEE).
Open this publication in new window or tab >>Color demosaicing using structural instability
2016 (English)In: Proceedings - 2016 IEEE International Symposium on Multimedia, ISM 2016, Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 541-544Conference paper, Published paper (Refereed)
Abstract [en]

This paper introduces a new metric for approximating structural instability in Bayer image data. We show that the metric can be used to identify and classify validity of color correlation in local image regions. The metric is used to improve interpolation performance of an existing state of-the-art single pass linear demosaicing algorithm, with virtually no impact on computational GPGPU complexity and performance. Using four different image sets, the modification is shown to outperform the original method in terms of visual quality, by having an average increase in PSNR of 0.7 dB in the red, 1.5 dB in the green and 0.6 dB in the blue channel respectively. Because of fewer high-frequency artifacts, the average output data size also decreases by 2.5%. © 2016 IEEE.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2016
Keyword
Bayer, Demosaicing, GPGPU, Linear, Structural instability, Program processors
National Category
Computer Systems
Identifiers
urn:nbn:se:bth-14066 (URN)10.1109/ISM.2016.0119 (DOI)000399166000109 ()2-s2.0-85015170613 (Scopus ID)9781509045709 (ISBN)
Conference
18th IEEE International Symposium on Multimedia, (ISM), San José, USA
Available from: 2017-03-30 Created: 2017-03-30 Last updated: 2018-02-02Bibliographically approved
Danielsson, M., Sievert, T., Grahn, H. & Rasmusson, J. (2016). Feature Detection and Description using a Harris-Hessian/FREAK Combination on an Embedded GPU. In: : . Paper presented at 5th Int’l Conf. on Pattern Recognition Applications and Methods (ICPRAM), Rome. Springer.
Open this publication in new window or tab >>Feature Detection and Description using a Harris-Hessian/FREAK Combination on an Embedded GPU
2016 (English)Conference paper, Published paper (Refereed)
Abstract [en]

GPUs in embedded platforms are reaching performance levels comparable to desktop hardware, thus it becomes interesting to apply Computer Vision techniques. We propose, implement, and evaluate a novel feature detector and descriptor combination, i.e., we combine the Harris-Hessian detector with the FREAK binary descriptor. The implementation is done in OpenCL, and we evaluate the execution time and classification performance. We compare our approach with two other methods, FAST/BRISK and ORB. Performance data is presented for the mobile device Xperia Z3 and the desktop Nvidia GTX 660. Our results indicate that the execution times on the Xperia Z3 are insufficient for real-time applications while desktop execution shows future potential. Classification performance of Harris-Hessian/FREAK indicates that the solution is sensitive to rotation, but superior in scale variant images.

Place, publisher, year, edition, pages
Springer, 2016
Keyword
GPU, Feature Detection, Feature Description, Mobile devices
National Category
Computer Sciences Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:bth-11444 (URN)
Conference
5th Int’l Conf. on Pattern Recognition Applications and Methods (ICPRAM), Rome
Projects
BigData@BTH - Scalable resource-efficient systems for big data analyticsIndustrial Excellence Center EASE - Embedded Applications Software Engineering
Funder
Knowledge Foundation, 20140032VINNOVA
Available from: 2016-01-19 Created: 2016-01-19 Last updated: 2018-02-02Bibliographically approved
Nilsson, E., Aarno, D., Carstensen, E. & Grahn, H. (2015). Accelerating Graphics in the Simics Full-System Simulator. In: : . Paper presented at 23rd International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS),Atlanta (pp. 150-153). IEEE Computer Society.
Open this publication in new window or tab >>Accelerating Graphics in the Simics Full-System Simulator
2015 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Virtual platforms provide benefits to developers in terms of a more rapid development cycle since development may begin before next-generation hardware is available. However, there is a distinct lack of graphics virtualization in industry-grade virtual platforms, leading to performance issues that may reduce the benefits virtual platforms otherwise have over execution on actual hardware. This paper demonstrates graphics acceleration by the means of paravirtualizing OpenGL ES in the Wind River Simics full-system simulator. We propose a solution for paravirtualized graphics using magic instructions to share memory between target and host systems, and present an implementation utilizing this method. The study illustrates the benefits and drawbacks of paravirtualized graphics acceleration and presents a performance analysis of strengths and weaknesses compared to software rasterization. Additionally, benchmarks are devised to stress key aspects in the solution, such as communication latency and computationally intensive applications. We assess paravirtualization as a viable method to accelerate graphics in system simulators, this reduces frame times up to 34 times compared to that of software rasterization. Furthermore, magic instructions are identified as the primary bottleneck of communication latency in the implementation.

Place, publisher, year, edition, pages
IEEE Computer Society, 2015
Keyword
circuit simulation;computer graphics;virtualisation;Wind River Simics full-system simulator;communication latency;computationally intensive applications;graphics virtualization;industry-grade virtual platforms;magic instructions;next-generation hardware;paravirtualized graphics acceleration;paravirtualizing OpenGL ES;rapid development cycle;share memory;software rasterization;system simulators;Acceleration;Benchmark testing;Hardware;Libraries;Software;Virtualization;full-system simulation;paravirtualization;simics
National Category
Computer Engineering Computer Sciences
Identifiers
urn:nbn:se:bth-11413 (URN)10.1109/MASCOTS.2015.33 (DOI)000380555200022 ()
Conference
23rd International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS),Atlanta
Available from: 2016-01-14 Created: 2016-01-14 Last updated: 2018-02-02Bibliographically approved
Lundberg, L., Grahn, H., Ilie, D. & Melander, C. (2015). Cache Support in a High Performance Fault-Tolerant Distributed Storage System for Cloud and Big Data. In: 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS: . Paper presented at IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW), Hyderabad (pp. 537-546). IEEE Computer Society.
Open this publication in new window or tab >>Cache Support in a High Performance Fault-Tolerant Distributed Storage System for Cloud and Big Data
2015 (English)In: 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IEEE Computer Society, 2015, p. 537-546Conference paper, Published paper (Refereed)
Abstract [en]

Due to the trends towards Big Data and Cloud Computing, one would like to provide large storage systems that are accessible by many servers. A shared storage can, however, become a performance bottleneck and a single-point of failure. Distributed storage systems provide a shared storage to the outside world, but internally they consist of a network of servers and disks, thus avoiding the performance bottleneck and single-point of failure problems. We introduce a cache in a distributed storage system. The cache system must be fault tolerant so that no data is lost in case of a hardware failure. This requirement excludes the use of the common write-invalidate cache consistency protocols. The cache is implemented and evaluated in two steps. The first step focuses on design decisions that improve the performance when only one server uses the same file. In the second step we extend the cache with features that focus on the case when more than one server access the same file. The cache improves the throughput significantly compared to having no cache. The two-step evaluation approach makes it possible to quantify how different design decisions affect the performance of different use cases.

Place, publisher, year, edition, pages
IEEE Computer Society, 2015
Keyword
big data; cloud; distributed storage systems; cache; performance evaluation
National Category
Computer Engineering
Identifiers
urn:nbn:se:bth-11411 (URN)10.1109/IPDPSW.2015.65 (DOI)000380446100062 ()978-1-4673-9739-1 (ISBN)
Conference
IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW), Hyderabad
Projects
Bigdata@BTH - Scalable resource-efficient systems for big data analytics
Funder
Knowledge Foundation
Available from: 2016-01-14 Created: 2016-01-14 Last updated: 2018-02-02Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-9947-1088

Search in DiVA

Show all publications