Change search
Refine search result
12 1 - 50 of 98
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Abghari, Shahrooz
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    García Martín, Eva
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Johansson, Christian
    NODA Intelligent Systems AB, SWE.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Trend analysis to automatically identify heat program changes2017In: Energy Procedia, Elsevier, 2017, Vol. 116, p. 407-415Conference paper (Refereed)
    Abstract [en]

    The aim of this study is to improve the monitoring and controlling of heating systems located at customer buildings through the use of a decision support system. To achieve this, the proposed system applies a two-step classifier to detect manual changes of the temperature of the heating system. We apply data from the Swedish company NODA, active in energy optimization and services for energy efficiency, to train and test the suggested system. The decision support system is evaluated through an experiment and the results are validated by experts at NODA. The results show that the decision support system can detect changes within three days after their occurrence and only by considering daily average measurements.

  • 2.
    Aziz, Hussein Muzahim
    et al.
    Blekinge Institute of Technology, School of Computing.
    Fiedler, Markus
    Blekinge Institute of Technology, School of Computing.
    Grahn, Håkan
    Blekinge Institute of Technology, School of Computing.
    Lundberg, Lars
    Blekinge Institute of Technology, School of Computing.
    Compressing Video Based on Region of Interest2013Conference paper (Refereed)
    Abstract [en]

    Real-time video streaming suffer from bandwidth limitation that are unable to handle the high amount of video data. To reduce the amount of data to be streamed, we propose an adaptive technique to crop the important part of the video frames, and drop the part that are outside the important part; this part is called the Region of Interest (ROI). The Sum of Absolute Differences (SAD) is computed to the consecutive video frames on the server side to identify and extract the ROI. The ROI are extracted from the frames that are between reference frames based on three scenarios. The scenarios been designed to position the reference frames in the video frames sequence. Linear interpolation is performed from the reference frames to reconstruct the part that are outside the ROI on the mobile side. We evaluate the proposed approach for the three scenarios by looking at the size of the compressed videos and measure the quality of the videos by using the Mean Opinion Score (MOS). The results show that our technique significantly reduces the amount of data to be streamed over wireless networks with acceptable video quality are provided to the mobile viewers.

  • 3. Aziz, Hussein Muzahim
    et al.
    Fiedler, Markus
    Blekinge Institute of Technology, School of Computing.
    Grahn, Håkan
    Blekinge Institute of Technology, School of Computing.
    Lundberg, Lars
    Blekinge Institute of Technology, School of Computing.
    Eliminating the Effects of Freezing Frames on User Perceptive by Using a Time Interleaving Technique2012In: Multimedia Systems, ISSN 0942-4962, E-ISSN 1432-1882, Vol. 18, no 3, p. 251-262Article in journal (Refereed)
    Abstract [en]

    Streaming video over a wireless network faces several challenges such as high packet error rates, bandwidth variations, and delays, which could have negative effects on the video streaming and the viewer will perceive a frozen picture for certain durations due to loss of frames. In this study, we propose a Time Interleaving Robust Streaming (TIRS) technique to significantly reduce the frozen video problem and provide a satisfactory quality for the mobile viewer. This is done by reordering the streaming video frames as groups of even and odd frames. The objective of streaming the video in this way is to avoid the losses of a sequence of neighbouring frames in case of a long sequence interruption. We evaluate our approach by using a user panel and mean opinion score (MOS) measurements; where the users observe three levels of frame losses. The results show that our technique significantly improves the smoothness of the video on the mobile device in the presence of frame losses, while the transmitted data are only increased by almost 9% (due to reduced time locality).

  • 4. Aziz, Hussein Muzahim
    et al.
    Fiedler, Markus
    Grahn, Håkan
    Lundberg, Lars
    Streaming Video as Space – Divided Sub-Frames over Wireless Networks2010Conference paper (Refereed)
    Abstract [en]

    Real time video streaming suffers from lost, delayed, and corrupted frames due to the transmission over error prone channels. As an effect of that, the user may notice a frozen picture in their screen. In this work, we propose a technique to eliminate the frozen video and provide a satisfactory quality to the mobile viewer by splitting the video frames into sub- frames. The multiple descriptions coding (MDC) is used to generate multiple bitstreams based on frame splitting and transmitted over multichannels. We evaluate our approach by using mean opinion score (MOS) measurements. MOS is used to evaluate our scenarios where the users observe three levels of frame losses for real time video streaming. The results show that our technique significantly improves the video smoothness on the mobile device in the presence of frame losses during the transmission.

  • 5. Aziz, Hussein Muzahim
    et al.
    Grahn, Håkan
    Lundberg, Lars
    Eliminating the Freezing Frames for the Mobile User over Unreliable Wireless Networks2009Conference paper (Refereed)
    Abstract [en]

    The main challenge of real time video streaming over a wireless network is to provide good quality service (QoS) to the mobile viewer. However, wireless networks have a limited bandwidth that may not be able to handle the continues video frame sequence and also with the possibility that video frames could be dropped or corrupted during the transmission. This could severely affect the video quality. In this study we come up with a mechanism to eliminate the frozen video and provide a quality satisfactory for the mobile viewer. This can be done by splitting the video frames to sub-frame and transmitted over multiple channels. We will present a subjective test, the Mean Opinion Score (MOS). MOS is used to evaluate our scenarios where the users can observe three levels of frame losses for real time video streaming. The results for our technique significantly improves the indicate perceived that video quality.

  • 6. Aziz, Hussein Muzahim
    et al.
    Grahn, Håkan
    Lundberg, Lars
    Sub-Frame Crossing for Streaming Video over Wireless Networks2010Conference paper (Refereed)
    Abstract [en]

    Transmitting a real time video streaming over a wireless network cannot guarantee that all the frames could be received by the mobile devices. The characteristics of a wireless network in terms of the available bandwidth, frame delay, and frame losses cannot be known in advanced. In this work, we propose a new mechanism for streaming video over a wireless channel. The proposed mechanism prevents freezing frames in the mobile devices. This is done by splitting the video frame in two sub-frames and combines them with another sub-frame from different sequence position in the streaming video. In case of lost or dropped frame, there is still a possibility that another half (sub-frame) will be received by the mobile device. The receiving sub-frames will be reconstructed to its original shape. A rate adaptation mechanism will be also highlight in this work. We show that sever can skip up to 50% of the sub-frames and we can still be able to reconstruct the receiving sub-frame and eliminate the freezing picture in the mobile device.

  • 7. Behnam, Moris
    et al.
    Nemati, Farhang
    Nolte, Thomas
    Grahn, Håkan
    Towards an efficient approach for resource sharing in real-time multiprocessor systems2011Conference paper (Refereed)
    Abstract [en]

    Supporting resource sharing in multiprocessor architectures is one of the major problems that limit the potential performance benefits of using such architectures for real-time systems. Many approaches and algorithms have been proposed to support resource sharing, however, most of them impose either high blocking times on tasks or require a large memory allocation. In this paper we investigate the possibility of combining the lock-based approaches and wait-free approaches (using multiple buffers) in order to decrease both the blocking times that may affect the schedulability of tasks and the required memory. To achieve this, we propose a solution based on evaluating the maximum allowed blocking time on each task according to the schedulability analysis, and then find the minimum memory allocation for each resource that limits the blocking times on tasks to be less than the maximum allowed blocking times.

  • 8. Broberg, Magnus
    et al.
    Lundberg, Lars
    Grahn, Håkan
    A Tool for Binding Threads to Processors2001Conference paper (Refereed)
    Abstract [en]

    Many multiprocessor systems are based on distributed shared memory. It is often important to statically bind threads to processors in order to avoid remote memory access, due to performance. Finding a good allocation takes long time and it is hard to know when to stop searching for a better one. It is sometimes impossible to run the application on the target machine. The developer needs a tool that finds the good allocations without the target multiprocessor. We present a tool that uses a greedy algorithm and produces allocations that are more than 40% faster (in average) than when using a binpacking algorithm. The number of allocations to be evaluated can be reduced by 38% with a 2% performance loss. Finally, an algorithm is proposed that is promising in avoiding local maxima.

  • 9. Broberg, Magnus
    et al.
    Lundberg, Lars
    Grahn, Håkan
    An Allocation Strategy Using Shadow-processors and Simulation Technique2001Conference paper (Refereed)
    Abstract [en]

    Efficient performance tuning of parallel programs for multiprocessors is often hard. When it comes to assigning threads to processors there is not much support from commercial operating systems, like the Solaris operating system. The only known value is, in best case, the total execution time of each thread. The developer is left to the bin packing algorithm with no knowledge about the interactions and dependencies between the threads. The bin packing algorithm assigns, in the worst case, the threads to the processors such that the program will have the longest possible execution time. A simple example of such a program is shown. We present here a way of retrieving more information and a test mechanism that makes it possible to compare two different assignments of threads on processors also with regard to the interactions and dependencies between the threads. Also an algorithm is proposed that gives the best assignment of threads to processors in the case above where the bin packing algorithm gave the worst possible assignment. The algorithm uses shadow-processors and requires more processors than on the target machine during some allocation steps. Thus, a simulation tool like the one presented here must be used.

  • 10. Broberg, Magnus
    et al.
    Lundberg, Lars
    Grahn, Håkan
    Performance Optimization using Critical Path Analysis in Multithreaded Programs on Multiprocessors1999Report (Other academic)
    Abstract [en]

    Efficient performance tuning of parallel programs is often hard. Optimization is often done when the program is written as a last effort to increase the performance. With sequential programs each (executed) code segment will affect the total execution time of the program. Thus, any code segment that is optimized in a sequential program will decrease the execution time. In the case of a parallel program executed on a multiprocessor this is not always true. This is due to dependencies between the different threads. As a result, certain code segments of the execution may not affect the total execution time of the program. Thus, optimization of such code segments will not increase the performance. In this paper we present a new approach to perform the optimization phase. Our approach finds the critical path of the multithreaded program and the optimization is only done on those specific code segments of the program. We have implemented the critical path analysis in a performance optimization tool.

  • 11.
    Broberg, Magnus
    et al.
    Blekinge Institute of Technology, Department of Software Engineering and Computer Science.
    Lundberg, Lars
    Blekinge Institute of Technology, Department of Software Engineering and Computer Science.
    Grahn, Håkan
    Blekinge Institute of Technology, Department of Software Engineering and Computer Science.
    Performance Optimization using Extended Critical Path Analysis in Multithreaded Programs on Multiprocessors2001In: Journal of Parallel and Distributed Computing, ISSN 0743-7315, Vol. 61, no 1, p. 115-136Article in journal (Refereed)
  • 12. Broberg, Magnus
    et al.
    Lundberg, Lars
    Grahn, Håkan
    Selecting simulation models when predicting parallel program behavior2002Conference paper (Refereed)
    Abstract [en]

    The use of multiprocessors is an important way to increase the performance of a parallel program. This means that. the program has to be parallelized to make use of the multiple processors. The parallelization is unfortunately not an easy task. Development tools supporting parallel programs are important. Further, it is the customer that decides the number of processors in the target machine, and as a result the developer has to make sure that the program runs efficiently on any number of processors. Many simulation tools support the developer by simulating any number of processors and predict the performance based on a uniprocessor execution trace. This popular technique gives reliable results in many cases. Based on our experience from developing such a tool, and studying other (commercial) tools, we have identified three basic simulation models. Due to the flexibility of general purpose programming languages and operating systems, like C/C++ and Sun Solaris, two of the models may cause deadlock in a deadlock-free program. Selecting the appropriate model is difficult, we show that the three models have significantly different accuracy when using real world programs. Based on the findings we present a practical scheme when to use the models.

  • 13. Broberg, Magnus
    et al.
    Lundberg, Lars
    Grahn, Håkan
    Selecting Simulation Models when Predicting Parallel Program Behaviour.2002Conference paper (Refereed)
  • 14. Broberg, Magnus
    et al.
    Lundberg, Lars
    Grahn, Håkan
    Selecting Simulation Models when Predicting Parallel Program Behaviour2002Report (Other academic)
    Abstract [en]

    The use of multiprocessors is an important way to increase the performance of a supercom-puting program. This means that the program has to be parallelized to make use of the multi-ple processors. The parallelization is unfortunately not an easy task. Development tools supporting parallel programs are important. Further, it is the customer that decides the number of processors in the target machine, and as a result the developer has to make sure that the pro-gram runs efficiently on any number of processors. Many simulation tools support the developer by simulating any number of processors and predict the performance based on a uni-processor execution trace. This popular technique gives reliable results in many cases. Based on our experience from developing such a tool, and studying other (commercial) tools, we have identified three basic simulation models. Due to the flexibility of general purpose programming languages and operating systems, like C/C++ and Sun Solaris, two of the models may cause deadlock in a deadlock-free program. Selecting the appropriate model is difficult, since we in this paper also show that the three models have significantly different accuracy when using real world programs. Based on the findings we present a practical scheme when to use the three models.

  • 15. Broberg, Magnus
    et al.
    Lundberg, Lars
    Grahn, Håkan
    Visualization and performance prediction of multithreaded Solaris programs by tracing kernel threads1999Conference paper (Refereed)
    Abstract [en]

    Efficient performance tuning of parallel programs is often hard. We present a performance prediction and visualization tool called VPPB. Based on a monitored uni-processor execution, VPPB shows the (predicted) behaviour of a multithreaded program using any number of processors and the program behaviour is visualized as a graph. The first version of VPPB was unable to handle I/O operations. This version has, by an improved tracing technique, added the possibility to trace activities at the kernel level as well. Thus, VPPB is now able to trace various I/O activities, e.g., manipulation of OS internal buffers, physical disk I/O, socket I/O, and RPC. VPPB allows flexible performance tuning of parallel programs developed for shared memory multiprocessors using a standardized environment; C/C++ programs that lues the thread package in Solaris 2.X.

  • 16. Broberg, Magnus
    et al.
    Lundberg, Lars
    Grahn, Håkan
    VPPB: A Visualization and Performance Prediction Tool for Multitreaded Solaris Programs1998Conference paper (Refereed)
    Abstract [en]

    Efficient performance tuning of parallel programs is often hard. In this paper we describe an approach that uses a uni-processor execution of a multithreaded program as reference to simulate a multiprocessor execution. The speed-up is predicted, and the program behaviour is visualized as a graph, which can be used in the performance tuning process. The simulator considers scheduling as well as hardware parameters, e.g., the thread priority, no. of LWPs, and no. of CPUs. The visualization part shows the simulated execution in two graphs: one showing the threads’ behaviour over time and the other the amount of parallel-ism over time. In the first graph is it possible to relate an event in the graph to the code line causing the event. Validation using a Sun multiprocessor with eight processors and five scientific parallel applications shows that the speed-up predictions are within +/-6% of a real execution.

  • 17.
    Cheddad, Abbas
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Kusetogullari, Hüseyin
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Object recognition using shape growth pattern2017In: Proceedings of the 10th International Symposium on Image and Signal Processing and Analysis, ISPA, IEEE Computer Society Digital Library, 2017, p. 47-52, article id 8073567Conference paper (Refereed)
    Abstract [en]

    This paper proposes a preprocessing stage to augment the bank of features that one can retrieve from binary images to help increase the accuracy of pattern recognition algorithms. To this end, by applying successive dilations to a given shape, we can capture a new dimension of its vital characteristics which we term hereafter: the shape growth pattern (SGP). This work investigates the feasibility of such a notion and also builds upon our prior work on structure preserving dilation using Delaunay triangulation. Experiments on two public data sets are conducted, including comparisons to existing algorithms. We deployed two renowned machine learning methods into the classification process (i.e., convolutional neural network-CNN- and random forests-RF-) since they perform well in pattern recognition tasks. The results show a clear improvement of the proposed approach's classification accuracy (especially for data sets with limited training samples) as well as robustness against noise when compared to existing methods.

  • 18.
    Danielsson, Max
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Technology and Aesthetics.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Sievert, Thomas
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Rasmusson, Jim
    Sony Mobile Communications AB, SWE.
    Comparing Two Generations of Embedded GPUs Running a Feature Detection AlgorithmManuscript (preprint) (Other academic)
    Abstract [en]

    Graphics processing units (GPUs) in embedded mobile platforms are reaching performance levels where they may be useful for computer vision applications. We compare two generations of embedded GPUs for mobile devices when run- ning a state-of-the-art feature detection algorithm, i.e., Harris- Hessian/FREAK. We compare architectural differences, execu- tion time, temperature, and frequency on Sony Xperia Z3 and Sony Xperia XZ mobile devices. Our results indicate that the performance soon is sufficient for real-time feature detection, the GPUs have no temperature problems, and support for large work-groups is important.

  • 19. Danielsson, Max
    et al.
    Sievert, Thomas
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Rasmusson, Jim
    Sony Mobile Communications AB.
    Feature Detection and Description using a Harris-Hessian/FREAK Combination on an Embedded GPU2016Conference paper (Refereed)
    Abstract [en]

    GPUs in embedded platforms are reaching performance levels comparable to desktop hardware, thus it becomes interesting to apply Computer Vision techniques. We propose, implement, and evaluate a novel feature detector and descriptor combination, i.e., we combine the Harris-Hessian detector with the FREAK binary descriptor. The implementation is done in OpenCL, and we evaluate the execution time and classification performance. We compare our approach with two other methods, FAST/BRISK and ORB. Performance data is presented for the mobile device Xperia Z3 and the desktop Nvidia GTX 660. Our results indicate that the execution times on the Xperia Z3 are insufficient for real-time applications while desktop execution shows future potential. Classification performance of Harris-Hessian/FREAK indicates that the solution is sensitive to rotation, but superior in scale variant images.

  • 20.
    García Martín, Eva
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Energy Efficiency Analysis of the Very Fast Decision Tree Algorithm2017In: Trends in Social Network Analysis: Information Propagation, User Behavior Modeling, Forecasting, and Vulnerability Assessment / [ed] Rokia Missaoui, Talel Abdessalem, Matthieu Latapy, Cham, Switzerland: Springer, 2017, p. 229-252Chapter in book (Refereed)
    Abstract [en]

    Data mining algorithms are usually designed to optimize a trade-off between predictive accuracy and computational efficiency. This paper introduces energy consumption and energy efficiency as important factors to consider during data mining algorithm analysis and evaluation. We conducted an experiment to illustrate how energy consumption and accuracy are affected when varying the parameters of the Very Fast Decision Tree (VFDT) algorithm. These results are compared with a theoretical analysis on the algorithm, indicating that energy consumption is affected by the parameters design and that it can be reduced significantly while maintaining accuracy.

  • 21.
    García Martín, Eva
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Energy Efficiency in Data Stream Mining2015In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2015, p. 1125-1132Conference paper (Refereed)
    Abstract [en]

    Data mining algorithms are usually designed to optimize a trade-off between predictive accuracy and computational efficiency. This paper introduces energy consumption and energy efficiency as important factors to consider during data mining algorithm analysis and evaluation. We extended the CRISP (Cross Industry Standard Process for Data Mining) framework to include energy consumption analysis. Based on this framework, we conducted an experiment to illustrate how energy consumption and accuracy are affected when varying the parameters of the Very Fast Decision Tree (VFDT) algorithm. The results indicate that energy consumption can be reduced by up to 92.5% (557 J) while maintaining accuracy.

  • 22.
    García Martín, Eva
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Identification of Energy Hotspots: A Case Study of the Very Fast Decision Tree2017In: GPC 2017: Green, Pervasive, and Cloud Computing / [ed] Au M., Castiglione A., Choo KK., Palmieri F., Li KC., Cham, Switzerland: Springer, 2017, Vol. 10232, p. 267-281Conference paper (Refereed)
    Abstract [en]

    Large-scale data centers account for a significant share of the energy consumption in many countries. Machine learning technology requires intensive workloads and thus drives requirements for lots of power and cooling capacity in data centers. It is time to explore green machine learning. The aim of this paper is to profile a machine learning algorithm with respect to its energy consumption and to determine the causes behind this consumption. The first scalable machine learning algorithm able to handle large volumes of streaming data is the Very Fast Decision Tree (VFDT), which outputs competitive results in comparison to algorithms that analyze data from static datasets. Our objectives are to: (i) establish a methodology that profiles the energy consumption of decision trees at the function level, (ii) apply this methodology in an experiment to obtain the energy consumption of the VFDT, (iii) conduct a fine-grained analysis of the functions that consume most of the energy, providing an understanding of that consumption, (iv) analyze how different parameter settings can significantly reduce the energy consumption. The results show that by addressing the most energy intensive part of the VFDT, the energy consumption can be reduced up to a 74.3%.

  • 23.
    García Martín, Eva
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Casalicchio, Emiliano
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Boeva, Veselka
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Hoeffding Trees with nmin adaptationManuscript (preprint) (Other academic)
    Abstract [en]

    Machine learning software accounts for a significant amount of energy consumed in data centers. These algorithms are usually optimized towards predictive performance, i.e. accuracy, and scalability. This is the case of data stream mining algorithms. Although these algorithms are adaptive to the incoming data, they have fixed parameters from the beginning of the execution, which lead to energy hotspots. We present dynamic parameter adaptation for data stream mining algorithms to trade-off energy efficiency against accuracy during runtime. To validate this approach, we introduce the nmin adaptation method to improve parameter adaptation in Hoeffding trees. This method dynamically adapts the number of instances needed to make a split (nmin) and thereby reduces the overall energy consumption. We created an experiment to compare the Very Fast Decision Tree algorithm (VFDT, original Hoeffding tree algorithm) with nmin adaptation and the standard VFDT. The results show that VFDT with nmin adaptation consumes up to 89% less energy than the standard VFDT, trading off a few percent of accuracy. Our approach can be used to trade off energy consumption with predictive and computational performance in the strive towards resource-aware machine learning. 

  • 24.
    Grahn, Håkan
    Blekinge Institute of Technology, Department of Telecommunications and Mathematics.
    Evaluation of design alternatives for a directory-based cache coherence protocol in shared-memory multiprocessors1995Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    In shared-memory multiprocessors, caches are attached to the processors in order to reduce the memory access latency. To keep the memory consistent, a cache coherence protocol is needed. A well known approach is to record which caches have copies of a memory block in a directory and only notify the caches having a copy when a processor modifies the block. Such a protocol is called a directory-based cache coherence protocol. This thesis, which is a summary of seven papers, identifies three problems in a directory-based protocol, and evaluates implementation and performance aspects of some design alternatives. The evaluation methodology is based on program-driven simulation. The write-invalidate policy, which is used in the baseline protocol, forces all other copies of a block to be invalidated when a processor modifies the block. This leads to a cache miss each time a processor accesses an invalidated block. To reduce the number of cache misses, a competitive-update policy is proposed in this thesis. The competitive-update policy is shown to reduce both the read stall and execution times as compared to write-invalidate under a relaxed memory consistency model. However, update-based policies need more buffering and hardware support in the caches. In the baseline protocol, the implementation cost of the directory is proportional to the number of caches. To reduce this cost, an alternative directory organization is proposed which distributes the directory information among the caches sharing the same memory block. To achieve a low write latency, the caches sharing a block are organized in a tree. The caches are linked into the tree in parallel with application execution to achieve a low read latency. The hardware-implemented directory controller in the baseline protocol may lead to high design complexity and implementation cost. This thesis evaluates a design alternative where the controller is implemented using software handlers executed on the compute processor. By using efficient strategies and proper architectural support, this design alternative is shown to be competitive with the baseline protocol. However, the performance of this alternative is more sensitive to other design choices, e.g., block size and latency tolerating techniques, than the baseline protocol.

  • 25. Grahn, Håkan
    Introduction to the Special Issue on the First Swedish Workshop on Multi-Core Computing2008In: SIGARCH Computer Architecture News, ISSN 0163-5964, E-ISSN 1943-5851, Vol. 36, no 5, p. 1-1Article in journal (Other academic)
    Abstract [en]

    Multicore processors have become the main computing platform for current and future computer systems. This calls for a forum to discuss the challenges and opportunities of both designing and using multicore systems. The objective of this workshop was to bring together researchers and practitioners from academia and industry to present and discuss the recent work in the area of multicore computing. The workshop was the first of its kind in Sweden, and was co-organized by Blekinge Institute of Technology (http://www.bth.se/) and the Swedish Multicore Initiative (http://www.swedishmulticore.se/). The technical program was put together by a distinguished program committee consisting of people from both academia and industry in Sweden. We received 16 extended abstracts. Each abstract was sent to four members of the program committee. In total, we collected 64 review reports. The abstracts were judged based on their merits in terms of relevance to the workshop, significance and originality, as well as the scientific and presentation quality. Based on the reviews, the program committee decided to accept 12 papers for inclusion in the workshop, giving an acceptance rate of 75%. The accepted papers cover a broad range of topics, such as programming techniques and languages, compiler and library support, coherence and consistency issues, and verification techniques for multicore systems. This workshop was the result of many people’s effort. First of all, I would like to thank Monica Nilsson and Madeleine Rovegård for their help with many practical arrangements and organizational issues around the workshop. Then, I would like to thank the program committee for their dedicated and hard work, especially finishing all reviews on time despite the short time frame so we could send out author notifications as scheduled. I would also like to thank the steering committee of the Sweden Multicore Initiative for valuable and fruitful discussions about how to make this workshop successful. Finally, I would like to thank the SIGARCH Computer Architecture News editor Doug DeGroot for his help when compiling this special issue. On behalf of the program committee, I hope you find the included papers interesting. The workshop will continue as an annual activity within the Swedish Multicore Initiative.

  • 26. Grahn, Håkan
    Proceedings of the 1st Swedish Workshop on Multi-Core Computing2008Report (Other academic)
    Abstract [en]

    Multicore processors have become the main computing platform for current and future computer systems. This calls for a forum to discuss the challenges and opportunities of both designing and using multicore systems. The objective of this workshop is to bring together researchers and practitioners from academia and industry to present and discuss the recent work in the area of multicore computing. The workshop is the first of its kind in Sweden, and it is co-organized by Blekinge Institute of Technology and the Swedish Multicore Initiative (http://www.sics.se/multicore/). The technical program was put together by a distinguished program committee consisting of people from both from academia and industry in Sweden. We received 16 extended abstracts. Each abstract was sent to four members of the program committee. In total, we collected 64 review reports. The abstracts were judged based on their merits in terms of relevance to the workshop, significance and originality, as well as the scientific and presentation quality. Based on the reviews, the program committee decided to accept 12 papers for inclusion in the workshop, giving an acceptance rate of 75%. The accepted papers cover a broad range of topics, such as programming techniques and languages, compiler and library support, coherence and consistency issues, and verification techniques for multicore systems.

  • 27.
    Grahn, Håkan
    Blekinge Institute of Technology, School of Computing.
    Transactional Memory2010In: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 70, no 10, p. 993-1008Article in journal (Refereed)
    Abstract [en]

    Current and future processor generations are based on multicore architectures where the performance increase comes from an increasing number of cores on a chip. In order to utilize the performance potential of multicore architectures the programs also need to be parallel, but writing parallel programs is a non-trivial task. Transactional memory tries to ease parallel program development by providing atomic and isolated execution of code sequences, enabling software composability and protected access to shared data. In addition, transactional memory has the ability to execute atomic code sequences in parallel as long as no data conflicts occur. Transactional memory implementation proposals exit for both hardware and software, as well as hybrid solutions. This special issue on transactional memory introduces transactional memory as a concept, presents an overview of some of the most important approaches so far, and finally, includes five articles that advances the state-of-the-art in transactional memory research.

  • 28. Grahn, Håkan
    et al.
    Holgersson, Marcus
    An Approach for Performance Measurements in Distributed CORBA Applications.2002Conference paper (Refereed)
    Abstract [en]

    One way to construct distributed systems is to use a communication model with distributed objects such as CORBA (Common Object Request Broker Architecture). Distributed objects give many advantages, but suffer from some performance problems. In order to handle the performance problem it is important to find where in the event chain the delays occur. Therefore, a tool for performance measurement and for identifying the performance bottlenecks in a distributed system should be a great help. In this paper we present an approach for performance measurements in distributed CORBA applications. Our approach is based on Interceptors, which is the technique we use for insertion of measurement points. This approach gives sufficient information for identifying many performance problems. In order to verify our approach, a prototype tool for profiling and performance measurements is constructed. A presentation program is built for making the captured information more readable. The tool and presentation programs show the execution flow of the system in different call graphs and also produces some call statistics at different levels. Finally, the tool is tested and verified in a distributed environment.

  • 29. Grahn, Håkan
    et al.
    Lavesson, Niklas
    Lapajne, Mikael Hellborg
    Slat, Daniel
    A CUDA Implementation of Random Forests: Early Results2010Conference paper (Refereed)
    Abstract [en]

    Machine learning algorithms are frequently applied in data mining applications. Many of the tasks in this domain concern high-dimensional data. Consequently, these tasks are often complex and computationally expensive. This paper presents a GPU-based parallel implementation of the Random Forests algorithm. In contrast to previous work, the proposed algorithm is based on the compute unified device architecture (CUDA). An experimental comparison between the CUDA-based algorithm (CudaRF), and state-of-the-art parallel (FastRF) and sequential (LibRF) Random forests algorithms shows that CudaRF outperforms both FastRF and LibRF for the studied classification task.

  • 30. Grahn, Håkan
    et al.
    Lavesson, Niklas
    Lapajne, Mikael Hellborg
    Slat, Daniel
    CudaRF: A CUDA-based Implementation of Random Forests2011Conference paper (Refereed)
    Abstract [en]

    Machine learning algorithms are frequently applied in data mining applications. Many of the tasks in this domain concern high-dimensional data. Consequently, these tasks are often complex and computationally expensive. This paper presents a GPU-based parallel implementation of the Random Forests algorithm. In contrast to previous work, the proposed algorithm is based on the compute unified device architecture (CUDA). An experimental comparison between the CUDA-based algorithm (CudaRF), and state-of-the-art Random Forests algorithms (FastRF and LibRF) shows that CudaRF outperforms both FastRF and LibRF for the studied classification task.

  • 31. Grahn, Håkan
    et al.
    Stenström, Per
    A Comparative Evaluation of Hardware-Only and Software-Only Directory Protocols in Shared-Memory Mulitprocessors2004In: Journal of Systems Architecture, ISSN 1383-7621 , Vol. 50, no 9, p. 537-561Article in journal (Refereed)
    Abstract [en]

    The hardware complexity of hardware-only directory protocols in shared-memory multiprocessors has motivated many researchers to emulate directory management by software handlers executed on the compute processors, called software-only directory protocols. In this paper, we evaluate the performance and design trade-offs between these two approaches in the same architectural simulation framework driven by eight applications from the SPLASH-2 suite. Our evaluation reveals some common case operations that can be supported by simple hardware mechanisms and can make the performance of software-only directory protocols competitive with that of hardware-only protocols. These mechanisms aim at either reducing the software handler latency or hiding it by overlapping it with the message latencies associated with internode memory transactions. Further, we evaluate the effects of cache block sizes between 16 and 256 bytes as well as two different page placement policies. Overall, we find that a software-only directory protocol enhanced with these mechanisms can reach between 63% and 97% of the baseline hardware-only protocol performance at a lower design complexity.

  • 32.
    Grahn, Håkan
    et al.
    Blekinge Institute of Technology, Department of Software Engineering and Computer Science.
    Stenström, Per
    Comparative evaluation of latency-tolerating and -reducing techniques for hardware-only and software-only directory protocols2000In: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 60, no 7, p. 807-834Article in journal (Refereed)
    Abstract [en]

    We study in this paper how effective latency-tolerating and -reducing techniques are at cutting the memory access times for shared-memory multiprocessors with directory cache protocols managed by hardware and software. A critical issue for the relative efficiency is how many protocol operations such techniques trigger. This paper presents a framework that makes it possible to reason about the expected relative efficiency of a latency-tolerating or -reducing technique by focusing on whether the technique increases, decreases, or does not change the number of protocol operations at the memory module. Since software-only directory protocols handle these operations in software they will perform relatively worse unless the technique reduces the number of protocol operations. Our experimental results from detailed architectural simulations driven by six applications from the SPLASH-2 parallel program suite confirm this expectation, We find that while prefetching performs relatively worse on software-only directory protocols due to useless prefetches, there are examples of protocol optimizations, e.g., optimizations For migratory data, that do relatively better on software-only directory protocols. Overall, this study shows that latency-tolerating techniques must be more carefully selected for software-centric than for hardware-centric implementations of distributed shared-memory systems. (C) 2000 Academic Press.

  • 33. Grahn, Håkan
    et al.
    Stenström, Per
    Efficient Strategies for Software-Only Directory Protocols in Shared-Memory Multiprocessors1995Conference paper (Refereed)
    Abstract [en]

    The cost, complexity, and inflexibility of hardware-based directory protocols motivate us to study the performance implications of protocols that emulate directory management using software handlers executed on the compute processors. An important performance limitation of such software-only protocols is that software latency associated with directory management ends up on the critical memory access path for read miss transactions. We propose five strategies that support efficient data transfers in hardware whereas directory management is handled at a slower pace in the background by software handlers. Simulations show that this approach can remove the directory-management latency from the memory access path. Whereas the directory is managed in software, the hardware mechanisms must access the memory state in order to enable data transfers at a high speed. Overall, our strategies reach between 60% and 86% of the hardware-based protocol performance.

  • 34.
    Grahn, Håkan
    et al.
    Blekinge Institute of Technology, Department of Software Engineering and Computer Science.
    Stenström, Per
    Evaluation of a Competitive-Update Cache Coherence Protocol with Migratory Data Detection1996In: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 39, no 2, p. 168-180Article in journal (Refereed)
    Abstract [en]

    Although directory-based write-invalidate cache coherence protocols have a potential to improve the performance of large-scale multiprocessors, coherence misses limit the processor utilization. Therefore, so-called competitive-update protocols-hybrid protocols that on a per-block basis dynamically switch between write-invalidate and write-update-have been considered as a means to reduce the coherence miss rate and have been shown to be a better coherence policy for a wide range of applications. Unfortunately, such protocols may cause high traffic peaks for applications with extensive use of migratory objects. These traffic peaks can offset the performance gain of a reduced miss rate if the network bandwidth is not sufficient. We propose in this study to extend a competitive-update protocol with a previously published adaptive mechanism that can dynamically detect migratory objects and reduce the coherence traffic they cause. Detailed architectural simulations based on five scientific and engineering applications show that this adaptive protocol outperforms a write-invalidate protocol by reducing the miss rate and bandwidth needed by up to 71 and 26%, respectively.

  • 35. Grahn, Håkan
    et al.
    Stenström, Per
    Relative Performance of Hardware and Software-Only Directory Protocols Under Latency Tolerating and Reducing Techniques1997Conference paper (Refereed)
    Abstract [en]

    In both hardware-only and software-only directory protocols the performance is often limited by memory access stall times. To increase the performance, several latency tolerating and reducing techniques have been proposed and shown effective for hardware-only directory protocols. For software-only directory protocols, the efficiency of a technique depends not only on how effective it is as seen by the local processor, but also on how it impacts the software handler execution overhead in the node where a memory block is allocated. Based on architectural simulations and case studies of three techniques, we find that prefetching can degrade the performance of software-only directory protocols due to useless prefetches. A relaxed memory consistency model hides all write latency for software-only directory protocols, but the software handler overhead is virtually unaffected and now constitutes a larger portion of the execution time. Overall, latency tolerating techniques for software-only directory protocols must be chosen with more care than for hardware-only directory protocols.

  • 36. Grahn, Håkan
    et al.
    Stenström, Per
    Dubois, Michel
    Implementation and Evaluation of Update-Based Cache Protocols Under Relaxed Memory Consistency Models1995In: Future Generations Computer Systems, ISSN 0167-739X , Vol. 11, no 3, p. 247-271Article in journal (Refereed)
    Abstract [en]

    The protocols of invalidation-based cache coherence have been extensively studied in the context of large-scale shared-memory multiprocessors. Under a relaxed memory consistency model, most of the write latency can be hidden whereas cache misses still incur a severe performance problem. By contrast, update-based protocols have a potential to reduce both write and read penalties under relaxed memory consistency models because coherence misses can be completely eliminated. This paper compares update- and invalidation-based protocols for their ability to reduce or hide memory access latencies and for their ease of implementation under relaxed memory consistency models.

  • 37.
    Grahn, Håkan
    et al.
    Blekinge Institute of Technology, School of Computing.
    Törnquist Krasemann, Johanna
    Blekinge Institute of Technology, School of Computing.
    A Parallel Re-Scheduling Algorithm for Railway Traffic Disturbance Management --- Initial Results2011Conference paper (Refereed)
    Abstract [en]

    In Sweden, the railway traffic and demand for track capacity have increased significantly the last years resulting in a high traffic density where even small disturbances propagate. This makes it hard for the traffic managers to overview the consequences of disturbances and their decisions when rescheduling the traffic. In previous research, we have developed and experimentally evaluated a greedy depthfirst search algorithm. This algorithm aims to support the traffic managers by computing alternative rescheduling solutions in order to minimize the train delays. The simulation experiments were based on real traffic data and the algorithm proved to be very effective for many types of disturbances and delivers optimal or close to optimal solutions in a few seconds. However, the experiments also indicated a need for improvements when solving more severe disturbances related to major infrastructure failures. The improvements concern primarily the need to explore larger parts of the search space quickly and to branch more effectively by avoiding exploring non-promising nodes. This paper presents results from an analysis of where and when successful branching decisions are made. The analysis showed that the successful branching decisions were generally made quite far down in the search space tree, but somewhat higher up during more severe disturbance scenarios. We also present an improved version of the greedy algorithm and a parallel implementation of it. The parallelization is composed of eight different threads allocated to one processor each starting to branch at the highest branching level. The experimental results show that it improves the solutions for difficult disturbance scenarios.

  • 38. Hussain, Sajid
    et al.
    Grahn, Håkan
    Fast kd-Tree Construction for 3D-rendering Algorithms like Ray Tracing2007Conference paper (Refereed)
    Abstract [en]

    Many computer graphics rendering algorithms and techniques use ray tracing for generation of natural and photo-realistic images. The efficiency of the ray tracing algorithms depends, among other techniques, upon the data structures used in the background. kd-trees are some of the most commonly used data structures for accelerating ray tracing algorithms. Data structures using cost optimization techniques based upon Surface Area Heuristics (SAH) are generally considered to be best and of high quality. During the last decade, the trend has been moved from off-line rendering towards real time rendering with the introduction of high speed computers and dedicated Graphical Processing Units (GPUs). In this situation, SAH-optimized structures have been considered too slow to allow real-time rendering of complex scenes. Our goal is to demonstrate an accelerated approach in building SAH-based data structures to be used in real time rendering algorithms. The quality of SAH-based data structures heavily depends upon split-plane locations and the major bottleneck of SAH techniques is the time consumed to find those optimum split locations. We present a parabolic interpolation technique combined with a golden section search criteria for predicting kd-tree split plane locations. The resulted structure is 30% faster with 6% quality degradation as compared to a standard SAH approach for reasonably complex scenes with around 170k polygons.

  • 39. Hussain, Sajid
    et al.
    Grahn, Håkan
    Ranking Journals, Conferences and Authors in Computer Graphics: A Fuzzy Reasoning2008Conference paper (Refereed)
    Abstract [en]

    Rankings of different research bodies are of particular interest for academia and bibliometrics is used to measure the quality of these research bodies. Different factors affecting this quality have been proposed. In this paper, we have demonstrated a new approach based on fuzzy models and taken into account different proposed factors to access the overall quality. Our model actually refines the ranking by adding more factors which affects this quality. Thus, improving the information retrieval system based on human reasoning. A dimensionless index called Fuzzy Index (FI) has been proposed and used to shuffle the previously ranked research bodies. We have successfully demonstrated the application of our FI in ranking journals, conferences and authors in the field of computer graphics in particular and computer science in general.

  • 40. Hussain, Sajid
    et al.
    Grahn, Håkan
    Tracking Data Structures Coherency in Animated Ray Tracing for Real-Time 3D-Rendering2008Conference paper (Refereed)
    Abstract [en]

    Ray tracing is a well known method for photorealistic image synthesis, volume visualization and rendering. Over the last decade the method is being adopted throughout the research community around the world. With the advent of the high speed processing units, the method has been emerging from offline rendering towards real time rendering. The success behind ray tracing algorithms lies in the use of acceleration data structures and modern processing power of CPUs and GPUs. kd-tree is one of the most widely used data structures based on surface area heuristics (SAH). The major bottleneck in kd-tree construction is the time consumed to find optimum split locations. In this paper, we propose a prediction algorithm for animated ray tracing based on Kalman filtering. The algorithm successfully predicts the split locations for the next consecutive frame in the animation sequence. Thus, giving good initial starting points for one dimensional search algorithms to find optimum split locations – in our case parabolic interpolation combined with golden section search. With our technique implemented, we have reduced the “running kd-tree construction” time by between 78% and 87% for dynamic scenes with 16.8K and 252K polygons respectively.

  • 41. Hussain, Sajid
    et al.
    Grahn, Håkan
    Tracking Data Structures Coherency in Animated Ray Tracing: Kalman and Wiener Filters Approach2008Conference paper (Refereed)
    Abstract [en]

    The generation of natural and photorealistic images in computer graphics, normally make use of a well known method called ray tracing. Ray tracing is being adopted as a primary image rendering method in the research community for the last few years. With the advent of todays high speed processors, the method has received much attention over the last decade. Modern power of GPUs/CPUs and the accelerated data structures are behind the success of ray tracing algorithms. kd-tree is one of the most widely used data structures based on surface area heuristics (SAH). The major bottleneck in kd-tree construction is the time consumed to find optimum split locations. In this paper, we propose a prediction algorithm for animated ray tracing based on Kalman and Wiener filters. Both the algorithms successfully predict the split locations for the next consecutive frame in the animation sequence. Thus, giving good initial starting points for one dimensional search algorithms to find optimum split locations – in our case parabolic interpolation combined with golden section search. With our technique implemented, we have reduced the “running kd-tree construction” time by between 78% and 87% for dynamic scenes with 16.8K and 252K polygons respectively.

  • 42. Hussain, Sajid
    et al.
    Grahn, Håkan
    Persson, Jan A.
    A Minimum Vertex Cover Feature-Preserving Mesh Simplification2008Conference paper (Refereed)
    Abstract [en]

    The complexity of meshes in computer graphics rendering algorithms like ray tracing hinders the performance of these algorithms. Therefore, the need arises for reduced and approximated mesh quality and at the same time preserving the salient features of the shape. Mesh simplification problem in some cases can be considered as initial value problem where the quality of simplified mesh heavily depends on initial selection of vertices for contraction. In this short paper, we present an ongoing work in mesh simplification that uses a greedy algorithm for minimum vertex cover problem in order to select vertex contraction pairs to preserve salient features in the simplified mesh. Initial experiments show promising results with preserved salient features.

  • 43. Hussain, Sajid
    et al.
    Grahn, Håkan
    Persson, Jan A.
    Feature-preserving Mesh Simplification: A Vertex Cover Approach2008Conference paper (Refereed)
    Abstract [en]

    In computer graphics image synthesis algorithms like ray tracing, the mesh complexity decreases the performance of these algorithms. Therefore, the need arises to reduce the complexity of these meshes and at the same time preserving the salient features of the shape. Initial selection of vertices for mesh simplification heavily relates with the quality of the simplified meshes. In this paper, we present a greedy approach to select initial vertex contraction pairs to preserve salient features in the simplified meshes. The greedy algorithm exploits the property of meshes where vertices forming small features contain less number of edges. The technique selects vertices connected with large number of edges and makes them potential candidates for contraction according to a given cost function. The purpose is to first simplify those regions which are enriched with number of triangles and preserve small details of the shape constructed with small number of triangles. Our technique preserves very small details in the shape even after considerable simplification as compared to other existing techniques. Initial experiments show promising results with preserved salient features.

  • 44. Iqbal, Syed Muhammad Zeeshan
    et al.
    Grahn, Håkan
    Krasemann, Johanna Törnquist
    A Parallel DFS Algorithm for Train Re-scheduling During Traffic Disturbances — Early Results2011Conference paper (Refereed)
    Abstract [en]

    Railways are an important part of the infrastructure in most countries. As the railway networks become more and more saturated, even small traffic disturbances can propagate and have severe consequences. In this paper, the train re-scheduling problem is studied in order to minimize the final delay for all trains in the scenarios. We propose a parallel algorithm based on a depth-first search branch-and-bound strategy. The parallel algorithm is compared to a sequential algorithm in terms of the quality of the solution and the number of nodes evaluated, as well as to optimal solutions found by Cplex, using 20 disturbance scenarios. Our parallel algorithm significantly improves the solution for 5 out of 20 disturbance scenarios, as compared to the sequential algorithm.

  • 45.
    Iqbal, Syed Muhammad Zeeshan
    et al.
    Blekinge Institute of Technology, School of Computing.
    Grahn, Håkan
    Blekinge Institute of Technology, School of Computing.
    Törnquist Krasemann,, Johanna
    Blekinge Institute of Technology, School of Computing.
    A parallel heuristic for fast train dispatching during railway traffic disturbances: Early results2012Conference paper (Refereed)
    Abstract [en]

    Railways are an important part of the infrastructure in most countries. As the railway networks become more and more saturated, even small traffic disturbances can propagate and have severe consequences. Therefore, efficient re-scheduling support for the traffic managers is needed. In this paper, the train real-time re-scheduling problem is studied in order to minimize the total delay, subject to a set of safety and operational constraints. We propose a parallel greedy algorithm based on a depth-first branch-and-bound search strategy. A number of comprehensive numerical experiments are conducted to compare the parallel implementation to the sequential implementation of the same algorithm in terms of the quality of the solution and the number of nodes evaluated. The comparison is based on 20 disturbance scenarios from three different types of disturbances. Our results show that the parallel algorithm; (i) efficiently covers a larger portion of the search space by exchanging information about improvements, and (ii) finds better solutions for more complicated disturbances such as infrastructure problems. Our results show that the parallel implementation significantly improves the solution for 5 out of 20 disturbance scenarios, as compared to the sequential algorithm.

  • 46.
    Iqbal, Syed Muhammad Zeeshan
    et al.
    Blekinge Institute of Technology, School of Computing.
    Grahn, Håkan
    Blekinge Institute of Technology, School of Computing.
    Törnquist Krasemann, Johanna
    Blekinge Institute of Technology, School of Computing.
    Multi-Strategy Based Train Re-Scheduling During Railway Traffic Disturbances2013Conference paper (Refereed)
    Abstract [en]

    Disruptions and delays in railway traffic networks due to different types of disturbances is a frequent problem in many countries. When disruptions occur, the train traffic dispatchers may need to re-schedule the traffic and this is often a very demanding and complicated task. To support the train traffic dispatchers, we propose to use a parallelized multi-strategy based greedy algorithm. This paper presents three different parallelization approaches: (i) Single Strategy with a Partitioned List (i.e. the parallel processes originate from different starting points), (ii) Multiple Strategies with a Non-Partitioned List, and (iii) Multiple Strategies with a Partitioned List. We present an evaluation for a busy part of the Swedish railway network based on performance metrics such as the sum of all train delays at their final destinations and the number of delayed trains. The results show that parallelization helps to improve the solution quality. The parallel approach (iii) that combines all re-scheduling strategies with a partitioned list performs best among the three parallel approaches when minimizing the total final delay. The main conclusion is that the multi-strategy based parallel approach significantly improves the solution for 11 out of 20 disturbance scenarios, as compared to the sequential re-scheduling algorithm. The approach also provides an increased stability since it always delivers a feasible solution in short time.

  • 47. Iqbal, Syed Muhammad Zeeshan
    et al.
    Liang, Yuchen
    Grahn, Håkan
    ParMiBench: An Open-Source Benchmark for Embedded Multiprocessor Systems2010In: IEEE Computer Architecture Letters, ISSN 1556-6056, Vol. 9, no 2, p. 45-48Article in journal (Refereed)
    Abstract [en]

    Multicore processors are the main computing platform in laptops, desktop, and servers today, and are making their way into the embedded systems market also. Using benchmarks is a common approach to evaluate the performance of a system. However, benchmarks for embedded systems have so far been either targeted for a uni-processor environment, e.g., MiBench, or have been commercial, e.g., MultiBench by EEMBC. In this paper, we propose and implement an open source benchmark, ParMiBench, targeted for multiprocessor-based embedded systems. ParMiBench consists of parallel implementations of seven compute intensive algorithms from the uni-processor benchmark suite MiBench. The applications are selected from four domains: Automation and Industry Control, Network, Office, and Security.

  • 48. Kågström, Simon
    et al.
    Grahn, Håkan
    Lundberg, Lars
    Automatic Low Overhead Program Instrumentation with the LOPI framework2005Conference paper (Refereed)
  • 49. Kågström, Simon
    et al.
    Grahn, Håkan
    Lundberg, Lars
    Cibyl: an Environment for Language Diversity on Mobile Devices2007Conference paper (Refereed)
    Abstract [en]

    With an estimated installation base of around 1 billion units, the Java J2ME platform is one of the largest development targets available. For mobile devices, J2ME is often the only available environment. For the very large body of software written in C other languages, this means difficult and costly porting to another language to support J2ME devices. This paper presents the Cibyl programming environment which allows existing code written in C and other languages supported by GCC to be recompiled into Java bytecode and run with close to native Java performance on J2ME devices. Cibyl translates compiled MIPS binaries into Java bytecode. In contrast to other approaches, Cibyl supports the full C language, is based on unmodified standard tools, and does not rely on source code conversion. To achieve good performance, Cibyl employs extensions to the MIPS architecture to support low-overhead calls to native Java functionality and use knowledge of the MIPS ABI to avoid computing unused values and transfer unnecessary registers. An evaluation on multiple virtual machines shows that Cibyl achieves performance similar to native Java, with results ranging from a slowdown of around to a speedup of over 9 depending on the JVM and the benchmark.

  • 50. Kågström, Simon
    et al.
    Grahn, Håkan
    Lundberg, Lars
    Experiences from Implementing Multiprocessor Support for an Industrial Operating System Kernel2005Conference paper (Refereed)
12 1 - 50 of 98
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf