Change search
Refine search result
12 1 - 50 of 72
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Abghari, Shahrooz
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Boeva, Veselka
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Gustafsson, Jörgen
    Ericsson AB.
    Shaikh, Junaid
    Ericsson AB.
    Outlier Detection for Video Session Data Using Sequential Pattern Mining2018In: ACM SIGKDD Workshop On Outlier Detection De-constructed, 2018Conference paper (Refereed)
    Abstract [en]

    The growth of Internet video and over-the-top transmission techniqueshas enabled online video service providers to deliver highquality video content to viewers. To maintain and improve thequality of experience, video providers need to detect unexpectedissues that can highly affect the viewers’ experience. This requiresanalyzing massive amounts of video session data in order to findunexpected sequences of events. In this paper we combine sequentialpattern mining and clustering to discover such event sequences.The proposed approach applies sequential pattern mining to findfrequent patterns by considering contextual and collective outliers.In order to distinguish between the normal and abnormal behaviorof the system, we initially identify the most frequent patterns. Thena clustering algorithm is applied on the most frequent patterns.The generated clustering model together with Silhouette Index areused for further analysis of less frequent patterns and detectionof potential outliers. Our results show that the proposed approachcan detect outliers at the system level.

  • 2.
    Abghari, Shahrooz
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Boeva, Veselka
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Ickin, Selim
    Ericsson, SWE.
    Gustafsson, Jörgen
    Ericsson, SWE.
    A Minimum Spanning Tree Clustering Approach for Outlier Detection in Event Sequences2018In: 2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA) / [ed] Wani M.A.,Sayed-Mouchaweh M.,Lughofer E.,Gama J.,Kantardzic M., IEEE, 2018, p. 1123-1130, article id 8614207Conference paper (Refereed)
    Abstract [en]

    Outlier detection has been studied in many domains. Outliers arise due to different reasons such as mechanical issues, fraudulent behavior, and human error. In this paper, we propose an unsupervised approach for outlier detection in a sequence dataset. The proposed approach combines sequential pattern mining, cluster analysis, and a minimum spanning tree algorithm in order to identify clusters of outliers. Initially, the sequential pattern mining is used to extract frequent sequential patterns. Next, the extracted patterns are clustered into groups of similar patterns. Finally, the minimum spanning tree algorithm is used to find groups of outliers. The proposed approach has been evaluated on two different real datasets, i.e., smart meter data and video session data. The obtained results have shown that our approach can be applied to narrow down the space of events to a set of potential outliers and facilitate domain experts in further analysis and identification of system level issues.

  • 3.
    Abghari, Shahrooz
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    García Martín, Eva
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Johansson, Christian
    NODA Intelligent Systems AB, SWE.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Trend analysis to automatically identify heat program changes2017In: Energy Procedia, Elsevier, 2017, Vol. 116, p. 407-415Conference paper (Refereed)
    Abstract [en]

    The aim of this study is to improve the monitoring and controlling of heating systems located at customer buildings through the use of a decision support system. To achieve this, the proposed system applies a two-step classifier to detect manual changes of the temperature of the heating system. We apply data from the Swedish company NODA, active in energy optimization and services for energy efficiency, to train and test the suggested system. The decision support system is evaluated through an experiment and the results are validated by experts at NODA. The results show that the decision support system can detect changes within three days after their occurrence and only by considering daily average measurements.

  • 4. Allahyari, Hiva
    et al.
    Lavesson, Niklas
    User-oriented Assessment of Classification Model Understandability2011Conference paper (Refereed)
    Abstract [en]

    This paper reviews methods for evaluating and analyzing the understandability of classification models in the context of data mining. The motivation for this study is the fact that the majority of previous work has focused on increasing the accuracy of models, ignoring user-oriented properties such as comprehensibility and understandability. Approaches for analyzing the understandability of data mining models have been discussed on two different levels: one is regarding the type of the models’ presentation and the other is considering the structure of the models. In this study, we present a summary of existing assumptions regarding both approaches followed by an empirical work to examine the understandability from the user’s point of view through a survey. The results indicate that decision tree models are more understandable than rule-based models. Using the survey results regarding understandability of a number of models in conjunction with quantitative measurements of the complexity of the models, we are able to establish correlation between complexity and understandability of the models.

  • 5.
    Angelova, Milena
    et al.
    Technical University of Sofia-branch Plovdiv, BUL.
    Vishnu Manasa, Devagiri
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Boeva, Veselka
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Linde, Peter
    Blekinge Institute of Technology, The Library.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    An Expertise Recommender System Based on Data from an Institutional Repository (DiVA)2018In: Proceedings of the 22nd edition of the International Conference on ELectronic PUBlishing, 2018Conference paper (Refereed)
    Abstract [en]

    Finding experts in academics is an important practical problem, e.g. recruiting reviewersfor reviewing conference, journal or project submissions, partner matching for researchproposals, finding relevant M. Sc. or Ph. D. supervisors etc. In this work, we discuss anexpertise recommender system that is built on data extracted from the Blekinge Instituteof Technology (BTH) instance of the institutional repository system DiVA (DigitalScientific Archive). DiVA is a publication and archiving platform for research publicationsand student essays used by 46 publicly funded universities and authorities in Sweden andthe rest of the Nordic countries (www.diva-portal.org). The DiVA classification system isbased on the Swedish Higher Education Authority (UKÄ) and the Statistic Sweden's (SCB)three levels classification system. Using the classification terms associated with studentM. Sc. and B. Sc. theses published in the DiVA platform, we have developed a prototypesystem which can be used to identify and recommend subject thesis supervisors inacademy.

  • 6.
    Angelova, Milena
    et al.
    Technical University of sofia, BUL.
    Vishnu Manasa, Devagiri
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Boeva, Veselka
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Linde, Peter
    Blekinge Institute of Technology, The Library.
    Lavesson, Niklas
    An Expertise Recommender System based on Data from an Institutional Repository (DiVA)2019In: Connecting the Knowledge Common from Projects to sustainable Infrastructure: The 22nd International conference on Electronic Publishing - Revised Selected Papers / [ed] Leslie Chan, Pierre Mounier, OpenEdition Press , 2019, p. 135-149Chapter in book (Refereed)
    Abstract [en]

    Finding experts in academics is an important practical problem, e.g. recruiting reviewersfor reviewing conference, journal or project submissions, partner matching for researchproposals, finding relevant M. Sc. or Ph. D. supervisors etc. In this work, we discuss anexpertise recommender system that is built on data extracted from the Blekinge Instituteof Technology (BTH) instance of the institutional repository system DiVA (DigitalScientific Archive). DiVA is a publication and archiving platform for research publicationsand student essays used by 46 publicly funded universities and authorities in Sweden andthe rest of the Nordic countries (www.diva-portal.org). The DiVA classification system isbased on the Swedish Higher Education Authority (UKÄ) and the Statistic Sweden's (SCB)three levels classification system. Using the classification terms associated with studentM. Sc. and B. Sc. theses published in the DiVA platform, we have developed a prototypesystem which can be used to identify and recommend subject thesis supervisors in academy.

  • 7. Baptista, Ana Alice
    et al.
    Linde, PeterBlekinge Institute of Technology, The Library.Lavesson, NiklasBlekinge Institute of Technology, School of Computing.Brito, Miguel Abrunhosa de
    Social Shaping of Digital Publishing: Exploring the Interplay Between Culture and Technology - Proceedings of the 16th International Conference on Electronic Publishing2012Collection (editor) (Other academic)
    Abstract [en]

    Since the advent of the Web, the processes and forms of electronic publishing have been changing. The open access movement has been a major driver of change in recent years with regard to scholarly communication; however, changes are also evident in other fields of application such as e-government and e-learning. In most cases these changes are driven by technological advances, but there are also cases where a change in social reality pushes technological development. Both the social and mobile web and linked data are currently shaping the edge of research in digital publishing. Liquid publishing is on the more daring agendas. Digital preservation is an issue that poses great challenges which are still far from being solved. The legal issues, security and trust continue to deserve our full attention. We need new visualization techniques and innovative interfaces that will keep pace with the global dimension of information. This is the current scenario, but what will follow? What are the technologies and social and communication paradigms that we will be discussing in ten or twenty years? ELPUB 2012 focuses on the social shaping of digital publishing, exploring the interplay between culture and technology. This makes the fact that it is being held in the European Capital of Culture for 2012, Guimarães, Portugal, all the more appropriate. 52 submissions were received for ELPUB 2012, from which 23 articles and 10 posters were accepted after peer review. Of the accepted articles, 11 were submitted as full articles and 12 as extended abstracts. These articles have been grouped into sessions on the following topics: Sessions 1 and 4 – Digital Scholarship & Publishing; Session 2 – Special Archives; Session 3 – Libraries & Repositories, Session 5 – Digital Texts & Readings, and Session 6 – Future Solutions & Innovations. The programme features two keynote speeches. Kathleen Fitzpatrick's speech is entitled “Planned Obsolescence: Publishing, Technology, and the Future of the Academy”, that of Antonio Câmara is entitled “Publishing in 2021”. Finally we call your attention to the panel on e-books, which is entitled “Academic e-books – Technological hostage or cultural redeemer?”. We believe this is another great edition of the ELPUB conference. We would like to take this opportunity to thank both the members of the ELPUB executive committee and the members of the local advisory committee, for making it happen. Together they provided valuable advice and assistance during the entire organization process. Secondly we would like to mention our colleagues on the program committee, who assured the quality of the conference through the peer review process. Last but not least, we wish to thank the local organization team for ensuring that all this effort culminates in a very interesting scientific event on the 14th and 15th of June. Thank you all for helping us to maintain the quality of ELPUB and merit the trust of our authors and attendees. We wish you all a good conference and we say farewell, hoping to see you again in Sweden in 2013!

  • 8. Beyene, Ayne A.
    et al.
    Welemariam, Tewelle
    Persson, Marie
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Improved concept drift handling in surgery prediction and other applications2015In: Knowledge and Information Systems, ISSN 0219-1377, Vol. 44, no 1, p. 177-196Article in journal (Refereed)
    Abstract [en]

    The article presents a new algorithm for handling concept drift: the Trigger-based Ensemble (TBE) is designed to handle concept drift in surgery prediction but it is shown to perform well for other classification problems as well. At the primary care, queries about the need for surgical treatment are referred to a surgeon specialist. At the secondary care, referrals are reviewed by a team of specialists. The possible outcomes of this review are that the referral: (i) is canceled, (ii) needs to be complemented, or (iii) is predicted to lead to surgery. In the third case, the referred patient is scheduled for an appointment with a surgeon specialist. This article focuses on the binary prediction of case three (surgery prediction). The guidelines for the referral and the review of the referral are changed due to, e.g., scientific developments and clinical practices. Existing decision support is based on the expert systems approach, which usually requires manual updates when changes in clinical practice occur. In order to automatically revise decision rules, the occurrence of concept drift (CD) must be detected and handled. The existing CD handling techniques are often specialized; it is challenging to develop a more generic technique that performs well regardless of CD type. Experiments are conducted to measure the impact of CD on prediction performance and to reduce CD impact. The experiments evaluate and compare TBE to three existing CD handling methods (AWE, Active Classifier, and Learn++) on one real-world dataset and one artificial dataset. TBA significantly outperforms the other algorithms on both datasets but is less accurate on noisy synthetic variations of the real-world dataset.

  • 9. Bhattacharyya, Prantik
    et al.
    Rowe, Jeff
    Wu, Felix
    Haigh, Karen
    Lavesson, Niklas
    Johnson, Henric
    Your Best might not be Good enough: Ranking in Collaborative Social Search Engines2011Conference paper (Refereed)
    Abstract [en]

    A relevant feature of online social networks like Facebook is the scope for users to share external information from the web with their friends by sharing an URL. The phenomenon of sharing has bridged the web graph with the social network graph and the shared knowledge in ego networks has become a source for relevant information for an individual user, leading to the emergence of social search as a powerful tool for information retrieval. Consideration of the social context has become an essential factor in the process of ranking results in response to queries in social search engines. In this work, we present InfoSearch, a social search engine built over the Facebook platform, which lets users search for information based on what their friends have shared. We identify and implement three distinct ranking factors based on the number of mutual friends, social group membership, and time stamp of shared documents to rank results for user searches. We perform user studies based on the Facebook feeds of two authors to understand the impact of each ranking factor on the result for two queries.

  • 10.
    Boeva, Veselka
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Angelova, Milena
    Technical University Sofia, BUL.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Rosander, Oliver
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Tsiporkova, Elena
    Collective Center for the Belgian Technological Industry, BEL.
    Evolutionary clustering techniques for expertise mining scenarios2018In: ICAART 2018 - Proceedings of the 10th International Conference on Agents and Artificial Intelligence, Volume 2 / [ed] van den Herik J.,Rocha A.P., SciTePress , 2018, Vol. 2, p. 523-530Conference paper (Refereed)
    Abstract [en]

    The problem addressed in this article concerns the development of evolutionary clustering techniques that can be applied to adapt the existing clustering solution to a clustering of newly collected data elements. We are interested in clustering approaches that are specially suited for adapting clustering solutions in the expertise retrieval domain. This interest is inspired by practical applications such as expertise retrieval systems where the information available in the system database is periodically updated by extracting new data. The experts available in the system database are usually partitioned into a number of disjoint subject categories. It is becoming impractical to re-cluster this large volume of available information. Therefore, the objective is to update the existing expert partitioning by the clustering produced on the newly extracted experts. Three different evolutionary clustering techniques are considered to be suitable for this scenario. The proposed techniques are initially evaluated by applying the algorithms on data extracted from the PubMed repository. Copyright © 2018 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.

  • 11. Boeva, Veselka
    et al.
    Ivanova, Petia
    Lavesson, Niklas
    A Hybrid Computational Method for the Identification of Cell Cycle-regulated Genes2010Conference paper (Refereed)
    Abstract [en]

    Gene expression microarrays are the most commonly available source of high-throughput biological data. They have been widely employed in recent years for the definition of cell cycle regulated (or periodically expressed) subsets of the genome in a number of different organisms. These have driven the development of various computational methods for identifying periodical expressed genes. However, the agreement is remarkably poor when different computational methods are applied to the same data. In view of this, we are motivated to propose herein a hybrid computational method targeting the identification of periodically expressed genes, which is based on a hybrid aggregation of estimations, generated by different computational methods. The proposed hybrid method is benchmarked against three other computational methods for the identification of periodically expressed genes: statistical tests for regulation and periodicity and a combined test for regulation and periodicity. The hybrid method is shown, together with the combined test, to statistically significantly outperform the statistical test for periodicity. However, the hybrid method is also demonstrated to be significantly better than the combined test for regulation and periodicity.

  • 12. Boldt, Martin
    et al.
    Jacobsson, Andreas
    Lavesson, Niklas
    Davidsson, Paul
    Automated Spyware Detection Using End User License Agreements2008Conference paper (Refereed)
    Abstract [en]

    The amount of spyware increases rapidly over the Internet and it is usually hard for the average user to know if a software application hosts spyware. This paper investigates the hypothesis that it is possible to detect from the End User License Agreement (EULA) whether its associated software hosts spyware or not. We generated a data set by collecting 100 applications with EULAs and classifying each EULA as either good or bad. An experiment was conducted, in which 15 popular default-configured mining algorithms were applied on the data set. The results show that 13 algorithms are significantly better than random guessing, thus we conclude that the hypothesis can be accepted. Moreover, 2 algorithms also perform significantly better than the current state-of-the-art EULA analysis method. Based on these results, we present a novel tool that can be used to prevent the installation of spyware.

  • 13. Borg, Anton
    et al.
    Boldt, Martin
    Lavesson, Niklas
    Informed Software Installation through License Agreement Categorization2011Conference paper (Refereed)
    Abstract [en]

    Spyware detection can be achieved by using machinelearning techniques that identify patterns in the End User License Agreements (EULAs) presented by application installers. However, solutions have required manual input from the user with varying degrees of accuracy. We have implemented an automatic prototype for extraction and classification and used it to generate a large data set of EULAs. This data set is used to compare four different machine learning algorithms when classifying EULAs. Furthermore, the effect of feature selection is investigated and for the top two algorithms, we investigate optimizing the performance using parameter tuning. Our conclusion is that feature selection and performance tuning are of limited use in this context, providing limited performance gains. However, both the Bagging and the Random Forest algorithms show promising results, with Bagging reaching an AUC measure of 0.997 and a False Negative Rate of 0.062. This shows the applicability of License Agreement Categorization for realizing informed software installation.

  • 14.
    Borg, Anton
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Boldt, Martin
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Melander, Ulf
    Boeva, Veselka
    Detecting serial residential burglaries using clustering2014In: Expert Systems with Applications, ISSN 0957-4174 , Vol. 41, no 11, p. 5252-5266Article in journal (Refereed)
    Abstract [en]

    According to the Swedish National Council for Crime Prevention, law enforcement agencies solved approximately three to five percent of the reported residential burglaries in 2012. Internationally, studies suggest that a large proportion of crimes are committed by a minority of offenders. Law enforcement agencies, consequently, are required to detect series of crimes, or linked crimes. Comparison of crime reports today is difficult as no systematic or structured way of reporting crimes exists, and no ability to search multiple crime reports exist. This study presents a systematic data collection method for residential burglaries. A decision support system for comparing and analysing residential burglaries is also presented. The decision support system consists of an advanced search tool and a plugin-based analytical framework. In order to find similar crimes, law enforcement officers have to review a large amount of crimes. The potential use of the cut-clustering algorithm to group crimes to reduce the amount of crimes to review for residential burglary analysis based on characteristics is investigated. The characteristics used are modus operandi, residential characteristics, stolen goods, spatial similarity, or temporal similarity. Clustering quality is measured using the modularity index and accuracy is measured using the rand index. The clustering solution with the best quality performance score were residential characteristics, spatial proximity, and modus operandi, suggesting that the choice of which characteristic to use when grouping crimes can positively affect the end result. The results suggest that a high quality clustering solution performs significantly better than a random guesser. In terms of practical significance, the presented clustering approach is capable of reduce the amounts of cases to review while keeping most connected cases. While the approach might miss some connections, it is also capable of suggesting new connections. The results also suggest that while crime series clustering is feasible, further investigation is needed.

  • 15.
    Borg, Anton
    et al.
    Blekinge Institute of Technology, School of Computing.
    Lavesson, Niklas
    Blekinge Institute of Technology, School of Computing.
    E-mail Classification using Social Network Information2012Conference paper (Refereed)
    Abstract [en]

    A majority of E-mail is suspected to be spam. Traditional spam detection fails to differentiate between user needs and evolving social relationships. Online Social Networks (OSNs) contain more and more social information, contributed by users. OSN information may be used to improve spam detection. This paper presents a method that can use several social networks for detecting spam and a set of metrics for representing OSN data. The paper investigates the impact of using social network data extracted from an E-mail corpus to improve spam detection. The social data model is compared to traditional spam data models by generating and evaluating classifiers from both model types. The results show that accurate spam detectors can be generated from the low-dimensional social data model alone, however, spam detectors generated from combinations of the traditional and social models were more accurate than the detectors generated from either model in isolation.

  • 16.
    Borg, Anton
    et al.
    Blekinge Institute of Technology, School of Computing.
    Lavesson, Niklas
    Blekinge Institute of Technology, School of Computing.
    Boeva, Veselka
    Comparison of clustering approaches for gene expression data2013In: Frontiers in Artificial Intelligence and Applications, IOS Press , 2013, Vol. 257, p. 55-64Conference paper (Refereed)
    Abstract [en]

    Clustering algorithms have been used to divide genes into groups according to the degree of their expression similarity. Such a grouping may suggest that the respective genes are correlated and/or co-regulated, and subsequently indicates that the genes could possibly share a common biological role. In this paper, four clustering algorithms are investigated: k-means, cut-clustering, spectral and expectation-maximization. The algorithms are benchmarked against each other. The performance of the four clustering algorithms is studied on time series expression data using Dynamic TimeWarping distance in order to measure similarity between gene expression profiles. Four different cluster validation measures are used to evaluate the clustering algorithms: Connectivity and Silhouette Index for estimating the quality of clusters, Jaccard Index for evaluating the stability of a cluster method and Rand Index for assessing the accuracy. The obtained results are analyzed by Friedman's test and the Nemenyi post-hoc test. K-means is demonstrated to be significantly better than the spectral clustering algorithm under the Silhouette and Rand validation indices.

  • 17.
    Dasari, Siva Krishna
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Andersson, Petter
    Engineering Method Development, GKN Aerospace Engine Systems Sweden.
    Persson, Marie
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Tree-Based Response Surface Analysis2015Conference paper (Refereed)
    Abstract [en]

    Computer-simulated experiments have become a cost effective way for engineers to replace real experiments in the area of product development. However, one single computer-simulated experiment can still take a significant amount of time. Hence, in order to minimize the amount of simulations needed to investigate a certain design space, different approaches within the design of experiments area are used. One of the used approaches is to minimize the time consumption and simulations for design space exploration through response surface modeling. The traditional methods used for this purpose are linear regression, quadratic curve fitting and support vector machines. This paper analyses and compares the performance of four machine learning methods for the regression problem of response surface modeling. The four methods are linear regression, support vector machines, M5P and random forests. Experiments are conducted to compare the performance of tree models (M5P and random forests) with the performance of non-tree models (support vector machines and linear regression) on data that is typical for concept evaluation within the aerospace industry. The main finding is that comprehensible models (the tree models) perform at least as well as or better than traditional black-box models (the non-tree models). The first observation of this study is that engineers understand the functional behavior, and the relationship between inputs and outputs, for the concept selection tasks by using comprehensible models. The second observation is that engineers can also increase their knowledge about design concepts, and they can reduce the time for planning and conducting future experiments.

  • 18.
    Davidsson, Paul
    et al.
    Blekinge Institute of Technology, School of Computing.
    Gustafsson Friberger, Marie
    Lavesson, Niklas
    Blekinge Institute of Technology, School of Computing.
    Persson, Jan
    Blekinge Institute of Technology, School of Computing.
    Towards a Prediction Model for People Movements in Urban Areas2013Conference paper (Refereed)
    Abstract [en]

    The aim of this work is to develop a new type of service for predicting and communicating urban activity. This service provides short-term predictions (hours to days), which can be used as a basis for different types of resource allocation and planning, e.g. concerning public transport, personnel, or marketing. The core of the service consists of a forecasting engine that based on a prediction model processes data on different levels of detail and from various providers. This paper explores the requirements and features of the forecast engine. We conclude that agent-based modeling seems as the most promising approach to meet these requirements. Finally, some examples of potential applications are described along with analyses of scientific and engineering issues that need to be addressed.

  • 19.
    García Martín, Eva
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Is it ethical to avoid error analysis?2017Conference paper (Refereed)
    Abstract [en]

    Machine learning algorithms tend to create more accurate models with the availability of large datasets. In some cases, highly accurate models can hide the presence of bias in the data. There are several studies published that tackle the development of discriminatory-aware machine learning algorithms. We center on the further evaluation of machine learning models by doing error analysis, to understand under what conditions the model is not working as expected. We focus on the ethical implications of avoiding error analysis, from a falsification of results and discrimination perspective. Finally, we show different ways to approach error analysis in non-interpretable machine learning algorithms such as deep learning.

  • 20.
    García Martín, Eva
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Energy Efficiency Analysis of the Very Fast Decision Tree Algorithm2017In: Trends in Social Network Analysis: Information Propagation, User Behavior Modeling, Forecasting, and Vulnerability Assessment / [ed] Rokia Missaoui, Talel Abdessalem, Matthieu Latapy, Cham, Switzerland: Springer, 2017, p. 229-252Chapter in book (Refereed)
    Abstract [en]

    Data mining algorithms are usually designed to optimize a trade-off between predictive accuracy and computational efficiency. This paper introduces energy consumption and energy efficiency as important factors to consider during data mining algorithm analysis and evaluation. We conducted an experiment to illustrate how energy consumption and accuracy are affected when varying the parameters of the Very Fast Decision Tree (VFDT) algorithm. These results are compared with a theoretical analysis on the algorithm, indicating that energy consumption is affected by the parameters design and that it can be reduced significantly while maintaining accuracy.

  • 21.
    García Martín, Eva
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Energy Efficiency in Data Stream Mining2015In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2015, p. 1125-1132Conference paper (Refereed)
    Abstract [en]

    Data mining algorithms are usually designed to optimize a trade-off between predictive accuracy and computational efficiency. This paper introduces energy consumption and energy efficiency as important factors to consider during data mining algorithm analysis and evaluation. We extended the CRISP (Cross Industry Standard Process for Data Mining) framework to include energy consumption analysis. Based on this framework, we conducted an experiment to illustrate how energy consumption and accuracy are affected when varying the parameters of the Very Fast Decision Tree (VFDT) algorithm. The results indicate that energy consumption can be reduced by up to 92.5% (557 J) while maintaining accuracy.

  • 22.
    García Martín, Eva
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Identification of Energy Hotspots: A Case Study of the Very Fast Decision Tree2017In: GPC 2017: Green, Pervasive, and Cloud Computing / [ed] Au M., Castiglione A., Choo KK., Palmieri F., Li KC., Cham, Switzerland: Springer, 2017, Vol. 10232, p. 267-281Conference paper (Refereed)
    Abstract [en]

    Large-scale data centers account for a significant share of the energy consumption in many countries. Machine learning technology requires intensive workloads and thus drives requirements for lots of power and cooling capacity in data centers. It is time to explore green machine learning. The aim of this paper is to profile a machine learning algorithm with respect to its energy consumption and to determine the causes behind this consumption. The first scalable machine learning algorithm able to handle large volumes of streaming data is the Very Fast Decision Tree (VFDT), which outputs competitive results in comparison to algorithms that analyze data from static datasets. Our objectives are to: (i) establish a methodology that profiles the energy consumption of decision trees at the function level, (ii) apply this methodology in an experiment to obtain the energy consumption of the VFDT, (iii) conduct a fine-grained analysis of the functions that consume most of the energy, providing an understanding of that consumption, (iv) analyze how different parameter settings can significantly reduce the energy consumption. The results show that by addressing the most energy intensive part of the VFDT, the energy consumption can be reduced up to a 74.3%.

  • 23.
    García Martín, Eva
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Casalicchio, Emiliano
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Boeva, Veselka
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Hoeffding Trees with nmin adaptationManuscript (preprint) (Other academic)
    Abstract [en]

    Machine learning software accounts for a significant amount of energy consumed in data centers. These algorithms are usually optimized towards predictive performance, i.e. accuracy, and scalability. This is the case of data stream mining algorithms. Although these algorithms are adaptive to the incoming data, they have fixed parameters from the beginning of the execution, which lead to energy hotspots. We present dynamic parameter adaptation for data stream mining algorithms to trade-off energy efficiency against accuracy during runtime. To validate this approach, we introduce the nmin adaptation method to improve parameter adaptation in Hoeffding trees. This method dynamically adapts the number of instances needed to make a split (nmin) and thereby reduces the overall energy consumption. We created an experiment to compare the Very Fast Decision Tree algorithm (VFDT, original Hoeffding tree algorithm) with nmin adaptation and the standard VFDT. The results show that VFDT with nmin adaptation consumes up to 89% less energy than the standard VFDT, trading off a few percent of accuracy. Our approach can be used to trade off energy consumption with predictive and computational performance in the strive towards resource-aware machine learning. 

  • 24.
    García Martín, Eva
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Casalicchio, Emiliano
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Boeva, Veselka
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Hoeffding Trees with nmin adaptation2018In: The 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2018), IEEE, 2018, p. 70-79Conference paper (Refereed)
    Abstract [en]

    Machine learning software accounts for a significant amount of energy consumed in data centers. These algorithms are usually optimized towards predictive performance, i.e. accuracy, and scalability. This is the case of data stream mining algorithms. Although these algorithms are adaptive to the incoming data, they have fixed parameters from the beginning of the execution. We have observed that having fixed parameters lead to unnecessary computations, thus making the algorithm energy inefficient.In this paper we present the nmin adaptation method for Hoeffding trees. This method adapts the value of the nmin pa- rameter, which significantly affects the energy consumption of the algorithm. The method reduces unnecessary computations and memory accesses, thus reducing the energy, while the accuracy is only marginally affected. We experimentally compared VFDT (Very Fast Decision Tree, the first Hoeffding tree algorithm) and CVFDT (Concept-adapting VFDT) with the VFDT-nmin (VFDT with nmin adaptation). The results show that VFDT-nmin consumes up to 27% less energy than the standard VFDT, and up to 92% less energy than CVFDT, trading off a few percent of accuracy in a few datasets.

  • 25.
    García Martín, Eva
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Casalicchio, Emiliano
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Boeva, Veselka
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    How to Measure Energy Consumption in Machine Learning Algorithms2019In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): ECMLPKDD 2018: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases Workshops. Lecture Notes in Computer Science. Springer, Cham, 2019, Vol. 11329, p. 243-255Conference paper (Refereed)
    Abstract [en]

    Machine learning algorithms are responsible for a significant amount of computations. These computations are increasing with the advancements in different machine learning fields. For example, fields such as deep learning require algorithms to run during weeks consuming vast amounts of energy. While there is a trend in optimizing machine learning algorithms for performance and energy consumption, still there is little knowledge on how to estimate an algorithm’s energy consumption. Currently, a straightforward cross-platform approach to estimate energy consumption for different types of algorithms does not exist. For that reason, well-known researchers in computer architecture have published extensive works on approaches to estimate the energy consumption. This study presents a survey of methods to estimate energy consumption, and maps them to specific machine learning scenarios. Finally, we illustrate our mapping suggestions with a case study, where we measure energy consumption in a big data stream mining scenario. Our ultimate goal is to bridge the current gap that exists to estimate energy consumption in machine learning scenarios.

  • 26. Grahn, Håkan
    et al.
    Lavesson, Niklas
    Lapajne, Mikael Hellborg
    Slat, Daniel
    A CUDA Implementation of Random Forests: Early Results2010Conference paper (Refereed)
    Abstract [en]

    Machine learning algorithms are frequently applied in data mining applications. Many of the tasks in this domain concern high-dimensional data. Consequently, these tasks are often complex and computationally expensive. This paper presents a GPU-based parallel implementation of the Random Forests algorithm. In contrast to previous work, the proposed algorithm is based on the compute unified device architecture (CUDA). An experimental comparison between the CUDA-based algorithm (CudaRF), and state-of-the-art parallel (FastRF) and sequential (LibRF) Random forests algorithms shows that CudaRF outperforms both FastRF and LibRF for the studied classification task.

  • 27. Grahn, Håkan
    et al.
    Lavesson, Niklas
    Lapajne, Mikael Hellborg
    Slat, Daniel
    CudaRF: A CUDA-based Implementation of Random Forests2011Conference paper (Refereed)
    Abstract [en]

    Machine learning algorithms are frequently applied in data mining applications. Many of the tasks in this domain concern high-dimensional data. Consequently, these tasks are often complex and computationally expensive. This paper presents a GPU-based parallel implementation of the Random Forests algorithm. In contrast to previous work, the proposed algorithm is based on the compute unified device architecture (CUDA). An experimental comparison between the CUDA-based algorithm (CudaRF), and state-of-the-art Random Forests algorithms (FastRF and LibRF) shows that CudaRF outperforms both FastRF and LibRF for the studied classification task.

  • 28. Isaksson, Ola
    et al.
    Bertoni, Marco
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mechanical Engineering.
    Hallstedt, Sophie
    Blekinge Institute of Technology, Faculty of Engineering, Department of Strategic Sustainable Development.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Model Based Decision Support for Value and Sustainability in Product Development2015Conference paper (Refereed)
    Abstract [en]

    Decomposing and clarify “sustainability” implications in the same way as concrete targets on product functionality is challenging, mainly due to the problem of showing numbers and ‘hard facts’ related to the value generated by sustainability-oriented decisions. The answer lies in methods and tools that are able, already in a preliminary design stage, to highlight how sustainable design choice can create value for customers and stakeholders, generating market success in the long term. The paper objective is to propose a framework where Sustainable Product Development (SPD) and Value Driven Design (VDD) can be integrated to realize a model-driven approach to support early stage design decisions. Also, the paper discusses how methods and tools for Model-Based Decision Support (MBDS) (e.g., response surface methodology) can be used to increase the computational efficiency of sustainability- and value-based analysis of design concepts. The paper proposes a range of activities to guide a model-based evaluation of sustainability consequences in design, showing also that capabilities exist already today for combining research efforts into a multi disciplinary decision making environment.

  • 29.
    Johansson, Christian
    et al.
    NODA, SWE.
    Bergkvist, Markus
    NODA, SWE.
    Geysen, Davy
    EnergyVille, BEL.
    De Somer, Oscar
    EnergyVille, BEL.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Vanhoudt, Dirk
    EnergyVille, BEL.
    Operational Demand Forecasting In District Heating Systems Using Ensembles Of Online Machine Learning Algorithms2017In: 15TH INTERNATIONAL SYMPOSIUM ON DISTRICT HEATING AND COOLING (DHC15-2016) / [ed] Ulseth, R, ELSEVIER SCIENCE BV , 2017, p. 208-216Conference paper (Refereed)
    Abstract [en]

    Heat demand forecasting is in one form or another an integrated part of most optimisation solutions for district heating and cooling (DHC). Since DHC systems are demand driven, the ability to forecast this behaviour becomes an important part of most overall energy efficiency efforts. This paper presents the current status and results from extensive work in the development, implementation and operational service of online machine learning algorithms for demand forecasting. Recent results and experiences are compared to results predicted by previous work done by the authors. The prior work, based mainly on certain decision tree based regression algorithms, is expanded to include other forms of decision tree solutions as well as neural network based approaches. These algorithms are analysed both individually and combined in an ensemble solution. Furthermore, the paper also describes the practical implementation and commissioning of the system in two different operational settings where the data streams are analysed online in real-time. It is shown that the results are in line with expectations based on prior work, and that the demand predictions have a robust behaviour within acceptable error margins. Applications of such predictions in relation to intelligent network controllers for district heating are explored and the initial results of such systems are discussed. (C) 2017 The Authors. Published by Elsevier Ltd.

  • 30. Johnson, Henric
    et al.
    Lavesson, Niklas
    Oliveira, Daniela A. S. de
    Wu, Felix
    Trustworthy opportunistic sensing: A Social Computing Paradigm2011Conference paper (Refereed)
    Abstract [en]

    In recent years, technological advances have lead to a society with communication platforms like iPhone and Kinect Xbox that are able to inject sensing presence into online social networks (OSNs). Thus, it is possible to create large-scale opportunistic networks by integrating sensors, applications and social networks and this development could also promote innovative collaborative cyber security models. In this position paper, we discuss how social informatics will play a crucial role in trustworthy pervasive computing. With regard to security, our primary computing paradigm is still about processing information content only in order to make decisions. Given the availability of both digitized social informatics and sensor content, we now have the option to examine these sources simultaneously. We refer to this new era as the Social Computing Paradigm, and we argue that it could be particularly useful in conjunction with opportunistic sensing.

  • 31. Johnson, Henric
    et al.
    Lavesson, Niklas
    Zhao, Haifeng
    Wu, Shyhtsun Felix
    On the Concept of Trust in Online Social Networks2011In: Trustworthy Internet / [ed] Salgarelli, Luca; Bianchi, Giuseppe; Blefari-Melazzi, Nicola, Springer , 2011, p. 143-157Chapter in book (Refereed)
    Abstract [en]

    Online Social Networks (OSNs), such as Facebook, Twitter, and Myspace, provide new and interesting ways to communicate, share, and meet on the Internet. On the one hand, these features have arguably made many of the OSNs quite popular among the general population but the growth of these networks has raised issues and concerns related to trust, privacy and security. On the other hand, some would argue that the true potential of OSNs has yet to be unleashed. The mainstream media have uncovered a rising number of potential and occurring problems, including: incomprehensible security settings, unlawful spreading of private or copyrighted information, the occurrence of threats and so on. We present a set of approaches designed to improve the trustworthiness of OSNs. Each approach is described and related to ongoing research projects and to views expressed about trust by surveyed OSN users. Finally, we present some interesting pointers to future work.

  • 32.
    Kazemi, Samira
    et al.
    Blekinge Institute of Technology, School of Computing.
    Abghari, Shahrooz
    Blekinge Institute of Technology, School of Computing.
    Lavesson, Niklas
    Blekinge Institute of Technology, School of Computing.
    Johnson, Henric
    Blekinge Institute of Technology, School of Computing.
    Ryman, Peter
    Open Data for Anomaly Detection in Maritime Surveillance2013In: Expert Systems with Applications, ISSN 0957-4174, Vol. 40, no 14, p. 5719-5729Article in journal (Refereed)
    Abstract [en]

    Maritime Surveillance has received increased attention from a civilian perspective in recent years. Anomaly detection is one of many techniques available for improving the safety and security in this domain. Maritime authorities use confidential data sources for monitoring the maritime activities; however, a paradigm shift on the Internet has created new open sources of data. We investigate the potential of using open data as a complementary resource for anomaly detection in maritime surveillance. We present and evaluate a decision support system based on open data and expert rules for this purpose. We conduct a case study in which experts from the Swedish coastguard participate to conduct a real-world validation of the system. We conclude that the exploitation of open data as a complementary resource is feasible since our results indicate improvements in the efficiency and effectiveness of the existing surveillance systems by increasing the accuracy and covering unseen aspects of maritime activities.

  • 33. Kostadinova, Elena
    et al.
    Boeva, Veselka
    Lavesson, Niklas
    Clustering of Multiple Microarray Experiments Using Information Integration2011In: Information Technology in Bio- and Medical Informatics / [ed] Böhm, C., Springer , 2011, p. 123-137Chapter in book (Refereed)
    Abstract [en]

    In this article, we study two microarray data integration techniques and describe how they can be applied and validated on a set of independent, but biologically related, microarray data sets in order to derive consistent and relevant clustering results. First, we present a cluster integration approach, which combines the information containing in multiple data sets at the level of expression or similarity matrices, and then applies a clustering algorithm on the combined matrix for subsequent analysis. Second, we propose a technique for the integration of multiple partitioning results. The performance of the proposed cluster integration algorithms is evaluated on time series expression data using two clustering algorithms and three cluster validation measures. We also propose a modified version of the Figure of Merit (FOM) algorithm, which is suitable for estimating the predictive power of clustering algorithms when they are applied to multiple expression data sets. In addition, an improved version of the well-known connectivity measure is introduced to achieve a more objective evaluation of the connectivity performance of clustering algorithms.

  • 34.
    Kusetogullari, Hüseyin
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Handwriting image enhancement using local learning windowing, Gaussian Mixture Model and k-means clustering2016In: 2016 IEEE International Symposium on Signal Processing and Information Technology, ISSPIT 2016, Institute of Electrical and Electronics Engineers Inc. , 2016, p. 305-310, article id 7886054Conference paper (Refereed)
    Abstract [en]

    In this paper, a new approach is proposed to enhance the handwriting image by using learning-based windowing contrast enhancement and Gaussian Mixture Model (GMM). A fixed size window moves over the handwriting image and two quantitative methods which are discrete entropy (DE) and edge-based contrast measure (EBCM) are used to estimate the quality of each patch. The obtained results are used in the unsupervised learning method by using k-means clustering to assign the quality of handwriting as bad (if it is low contrast) or good (if it is high contrast). After that, if the corresponding patch is estimated as low contrast, a contrast enhancement method is applied to the window to enhance the handwriting. GMM is used as a final step to smoothly exchange information between original and enhanced images to discard the artifacts to represent the final image. The proposed method has been compared with the other contrast enhancement methods for different datasets which are Swedish historical documents, DIBCO2010, DIBCO2012 and DIBCO2013. Results illustrate that proposed method performs well to enhance the handwriting comparing to the existing contrast enhancement methods. © 2016 IEEE.

  • 35. Lavesson, Niklas
    Evaluation and Analysis of Supervised Learning Algorithms and Classifiers2006Licentiate thesis, comprehensive summary (Other academic)
    Abstract [en]

    The fundamental question studied in this thesis is how to evaluate and analyse supervised learning algorithms and classifiers. As a first step, we analyse current evaluation methods. Each method is described and categorised according to a number of properties. One conclusion of the analysis is that performance is often only measured in terms of accuracy, e.g., through cross-validation tests. However, some researchers have questioned the validity of using accuracy as the only performance metric. Also, the number of instances available for evaluation is usually very limited. In order to deal with these issues, measure functions have been suggested as a promising approach. However, a limitation of current measure functions is that they can only handle two-dimensional instance spaces. We present the design and implementation of a generalised multi-dimensional measure function and demonstrate its use through a set of experiments. The results indicate that there are cases for which measure functions may be able to capture aspects of performance that cannot be captured by cross-validation tests. Finally, we investigate the impact of learning algorithm parameter tuning. To accomplish this, we first define two quality attributes (sensitivity and classification performance) as well as two metrics for measuring each of the attributes. Using these metrics, a systematic comparison is made between four learning algorithms on eight data sets. The results indicate that parameter tuning is often more important than the choice of algorithm. Moreover, quantitative support is provided to the assertion that some algorithms are more robust than others with respect to parameter configuration. To sum up, the contributions of this thesis include; the definition and application of a formal framework which enables comparison and deeper understanding of evaluation methods from different fields of research, a survey of current evaluation methods, the implementation and analysis of a multi-dimensional measure function and the definition and analysis of quality attributes used to investigate the impact of learning algorithm parameter tuning.

  • 36.
    Lavesson, Niklas
    Blekinge Institute of Technology, Department of Software Engineering and Computer Science.
    Evaluation of classifier performance and the impact of learning algorithm parameters2003Independent thesis Advanced level (degree of Master (One Year))Student thesis
    Abstract [en]

    Much research has been done in the fields of classifier performance evaluation and optimization. This work summarizes this research and tries to answer the question if algorithm parameter tuning has more impact on performance than the choice of algorithm. An alternative way of evaluation; a measure function is also demonstrated. This type of evaluation is compared with one of the most accepted methods; the cross-validation test. Experiments, described in this work, show that parameter tuning often has more impact on performance than the actual choice of algorithm and that the measure function could be a complement or an alternative to the standard cross-validation tests.

  • 37. Lavesson, Niklas
    Learning Machine Learning: A Case Study2010In: IEEE Transactions on Education, ISSN 0018-9359 , Vol. 53, no 4, p. 672-676Article in journal (Refereed)
    Abstract [en]

    This correspondence reports on a case study conducted in the Master's-level Machine Learning (ML) course at Blekinge Institute of Technology, Sweden. The students participated in a self-assessment test and a diagnostic test of prerequisite subjects, and their results on these tests are correlated with their achievement of the course's learning objectives.

  • 38. Lavesson, Niklas
    On the Metric-based Approach to Supervised Concept Learning2008Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    A classifier is a piece of software that is able to categorize objects for which the class is unknown. The task of automatically generating classifiers by generalizing from examples is an important problem in many practical applications. This problem is often referred to as supervised concept learning, and has been shown to be relevant in e.g. medical diagnosis, speech and handwriting recognition, stock market analysis, and other data mining applications. The main purpose of this thesis is to analyze current approaches to evaluate classifiers as well as supervised concept learners and to explore possible improvements in terms of alternative or complementary approaches. In particular, we investigate the metric-based approach to evaluation as well as how it can be used when learning. Any supervised concept learning algorithm can be viewed as trying to generate a classifier that optimizes a specific, often implicit, metric (this is sometimes also referred to as the inductive bias of the algorithm). In addition, different metrics are suitable for different learning tasks, i.e., the requirements vary between application domains. The idea of metric-based learning is to both make the metric explicit and let it be defined by the user based on the learning task at hand. The thesis contains seven studies, each with its own focus and scope. First, we present an analysis of current evaluation methods and contribute with a formalization of the problems of learning, classification and evaluation. We then present two quality attributes, sensitivity and classification performance, that can be used to evaluate learning algorithms. To demonstrate their usefulness, two metrics for these attributes are defined and used to quantify the impact of parameter tuning and the overall performance. Next, we refine an approach to multi-criteria classifier evaluation, based on the combination of three metrics and present algorithms for calculating these metrics. In the fourth study, we present a new method for multi-criteria evaluation, which is generic in the sense that it only dictates how to combine metrics. The actual choice of metrics is application-specific. The fifth study investigates whether or not the performance according to an arbitrary application-specific metric can be boosted by using that metric as the one that the learning algorithm aims to optimize. The subsequent study presents a novel data mining application for preventing spyware by classifying End User License Agreements. A number of state-of-the-art learning algorithms are compared using the generic multi-criteria method. Finally, in the last study we describe how methods from the area of software engineering can be used to solve the problem of selecting relevant evaluation metrics for the application at hand.

  • 39. Lavesson, Niklas
    Predicting the Risk of Future Hospitalization2010Conference paper (Refereed)
    Abstract [en]

    Elderly over 80 is the fastest growing segment of the Swedish population. With this increase in age the proportion of people with more than one chronic disease, multiple prescribed drugs, and disabilities is getting larger. At the same time, hospitalization accounts for a large amount of the total cost of healthcare. We hypothesize that the number and duration of these hospitalizations could be reduced if the primary care was given suitable tools to predict the risk and/or duration of hospitalization, which then could be used as a basis for providing suitable interventions. In this paper, we investigate the possibility to learn how to predict the risk of hospitalization of the elderly by mining patient data, in terms of age, sex, as well as diseases and prescribed drugs for a large number of patients. We have obtained diagnosis and drug use data from 2006, and associate these data with the number of days of hospitalization from 2007 for 406,272 subjects from the Östergötland county healthcare database. We suggest a data mining approach for automatically generating prediction models and empirically compare two learning algorithms on the problem of predicting the risk for hospitalization.

  • 40.
    Lavesson, Niklas
    et al.
    Blekinge Institute of Technology, School of Computing.
    Axelsson, Stefan
    Blekinge Institute of Technology, School of Computing.
    Similarity assessment for removal of noisy end user license agreements2012In: Knowledge and Information Systems, ISSN 0219-1377, Vol. 32, no 1, p. 167-189Article in journal (Refereed)
    Abstract [en]

    In previous work, we have shown the possibility to automatically discriminate between legitimate software and spyware-associated software by performing supervised learning of end user license agreements (EULAs). However, the amount of false positives (spyware classified as legitimate software) was too large for practical use. In this study, the false positives problem is addressed by removing noisy EULAs, which are identified by performing similarity analysis of the previously studied EULAs. Two candidate similarity analysis methods for this purpose are experimentally compared: cosine similarity assessment in conjunction with latent semantic analysis (LSA) and normalized compression distance (NCD). The results show that the number of false positives can be reduced significantly by removing noise identified by either method. However, the experimental results also indicate subtle performance differences between LSA and NCD. To improve the performance even further and to decrease the large number of attributes, the categorical proportional difference (CPD) feature selection algorithm was applied. CPD managed to greatly reduce the number of attributes while at the same time increase classification performance on the original data set, as well as on the LSA- and NCD-based data sets.

  • 41.
    Lavesson, Niklas
    et al.
    Blekinge Institute of Technology, School of Computing.
    Boeva, Veselka
    Elena, Tsiporkova
    Davidsson, Paul
    A method for evaluation of learning components2014In: Automated Software Engineering: An International Journal, ISSN 0928-8910, E-ISSN 1573-7535, Vol. 21, no 1, p. 41-63Article in journal (Refereed)
    Abstract [en]

    Today, it is common to include machine learning components in software products. These components offer specific functionalities such as image recognition, time series analysis, and forecasting but may not satisfy the non-functional constraints of the software products. It is difficult to identify suitable learning algorithms for a particular task and software product because the non-functional requirements of the product affect algorithm suitability. A particular suitability evaluation may thus require the assessment of multiple criteria to analyse trade-offs between functional and non-functional requirements. For this purpose, we present a method for APPlication-Oriented Validation and Evaluation (APPrOVE). This method comprises four sequential steps that address the stated evaluation problem. The method provides a common ground for different stakeholders and enables a multi-expert and multi-criteria evaluation of machine learning algorithms prior to inclusion in software products. Essentially, the problem addressed in this article concerns how to choose the appropriate machine learning component for a particular software product.

  • 42. Lavesson, Niklas
    et al.
    Boldt, Martin
    Davidsson, Paul
    Jacobsson, Andreas
    Learning to detect spyware using end user license agreements2011In: Knowledge and Information Systems, ISSN 0219-1377, Vol. 26, no 2, p. 285-307Article in journal (Refereed)
    Abstract [en]

    The amount of software that hosts spyware has increased dramatically. To avoid legal repercussions, the vendors need to inform users about inclusion of spyware via end user license agreements (EULAs) during the installation of an application. However, this information is intentionally written in a way that is hard for users to comprehend. We investigate how to automatically discriminate between legitimate software and spyware associated software by mining EULAs. For this purpose, we compile a data set consisting of 996 EULAs out of which 9.6% are associated to spyware. We compare the performance of 17 learning algorithms with that of a baseline algorithm on two data sets based on a bag-of-words and a meta data model. The majority of learning algorithms significantly outperform the baseline regardless of which data representation is used. However, a non-parametric test indicates that bag-of-words is more suitable than the meta model. Our conclusion is that automatic EULA classification can be applied to assist users in making informed decisions about whether to install an application without having read the EULA. We therefore outline the design of a spyware prevention tool and suggest how to select suitable learning algorithms for the tool by using a multi-criteria evaluation approach.

  • 43. Lavesson, Niklas
    et al.
    Davidsson, Paul
    AMORI: A Metric-based One Rule Inducer2009Conference paper (Refereed)
    Abstract [en]

    The requirements of real-world data mining problems vary extensively. It is plausible to assume that some of these requirements can be expressed as application-specific performance metrics. An algorithm that is designed to maximize performance given a certain learning metric may not produce the best possible result according to these application-specific metrics. We have implemented A Metric-based One Rule Inducer (AMORI), for which it is possible to select the learning metric. We have compared the performance of this algorithm by embedding three different learning metrics (classification accuracy, the F-measure, and the area under the ROC curve), on 19 UCI data sets. In addition, we have compared the results of AMORI with those obtained using an existing rule learning algorithm of similar complexity (One Rule) and a state-of-the-art rule learner (Ripper). The experiments show that a performance gain is achieved, for all included metrics, when using identical metrics for learning and evaluation. We also show that each AMORI/metric combination outperforms One Rule when using identical learning and evaluation metrics. The performance of AMORI is acceptable when compared with Ripper. Overall, the results suggest that metric-based learning is a viable approach.

  • 44. Lavesson, Niklas
    et al.
    Davidsson, Paul
    Analysis of Multi-Criteria Methods for Classifier and Algorithm Evaluation2007Conference paper (Refereed)
  • 45. Lavesson, Niklas
    et al.
    Davidsson, Paul
    APPrOVE: Application-oriented Validation and Evaluation of Supervised Learners2010Conference paper (Refereed)
    Abstract [en]

    Learning algorithm evaluation is usually focused on classification performance. However, the characteristics and requirements of real-world applications vary greatly. Thus, for a particular application, some evaluation criteria are more important than others. In fact, multiple criteria need to be considered to capture application-specific trade-offs. Many multi-criteria methods can be used for the actual evaluation but the problems of selecting appropriate criteria and metrics as well as capturing the trade-offs still persist. This paper presents a framework for application-oriented validation and evaluation (APPrOVE). The framework includes four sequential steps that together address the aforementioned problems and its use in practice is demonstrated through a case study.

  • 46. Lavesson, Niklas
    et al.
    Davidsson, Paul
    Evaluating learning algorithms and classifiers2007In: International Journal of Intelligent Information and Database Systems, ISSN 1751-5858 , Vol. 1, no 1, p. 37-52Article in journal (Refereed)
  • 47. Lavesson, Niklas
    et al.
    Davidsson, Paul
    Generic Methods for Multi-criteria Evaluation2008Conference paper (Refereed)
    Abstract [en]

    When evaluating data mining algorithms that are applied to solve real-world problems there are often several, conflicting criteria that need to be considered. We investigate the concept of generic multi-criteria (MC) classifier and algorithm evaluation and perform a comparison of existing methods. This comparison makes explicit some of the important characteristics of MC analysis and focuses on finding out which method is most suitable for further development. Generic MC methods can be described as frameworks for combining evaluation metrics and are generic in the sense that the metrics used are not dictated by the method; the choice of metric is instead dependent on the problem at hand. We discuss some scenarios that benefit from the application of generic MC methods and synthesize what we believe are attractive properties from the reviewed methods into a new method called the candidate evaluation function (CEF). Finally, we present a case study in which we apply CEF to trade-off several criteria when solving a real-world problem.

  • 48. Lavesson, Niklas
    et al.
    Davidsson, Paul
    Quantifying the Impact of Learning Algorithm Parameter Tuning2006Conference paper (Refereed)
  • 49. Lavesson, Niklas
    et al.
    Davidsson, Paul
    Towards Application-specific Evaluation Metrics2008Conference paper (Refereed)
    Abstract [en]

    Classifier evaluation has historically been conducted by estimating predictive accuracy via cross-validation tests or similar methods. More recently, ROC analysis has been shown to be a good alternative. However, the characteristics vary greatly between problem domains and it has been shown that some evaluation metrics are more appropriate than others in certain cases. We argue that different problems have different requirements and should therefore make use of evaluation metrics that correspond to the relevant requirements. For this purpose, we motivate the need for generic multi-criteria evaluation methods, i.e., methods that dictate how to integrate metrics but not which metrics to integrate. We present such a generic evaluation method and discuss how to select metrics on the basis of the application at hand.

  • 50. Lavesson, Niklas
    et al.
    Davidsson, Paul
    Boldt, Martin
    Jacobsson, Andreas
    Spyware Prevention by Classifying End User License Agreements2008In: New Challenges in Applied Intelligence Technologies / [ed] Nguyen, Ngoc Thanh; Katarzyniak, Radoslaw, Berlin / Heidelberg: Springer , 2008, p. 373-382Chapter in book (Refereed)
    Abstract [en]

    We investigate the hypothesis that it is possible to detect from the End User License Agreement (EULA) if the associated software hosts spyware. We apply 15 learning algorithms on a data set consisting of 100 applications with classified EULAs. The results show that 13 algorithms are significantly more accurate than random guessing. Thus,we conclude that the hypothesis can be accepted. Based on the results, we present a novel tool that can be used to prevent spyware by automatically halting application installers and classifying the EULA, giving users the opportunity to make an informed choice about whether to continue with the installation. We discuss positive and negative aspects of this prevention approach and suggest a method for evaluating candidate algorithms for a future implementation.

12 1 - 50 of 72
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf