Change search
Refine search result
1 - 37 of 37
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Ahmed, Soban
    et al.
    Natl Univ Comp & Emerging Sci, PAK.
    Bhatti, Muhammad Tahir
    Natl Univ Comp & Emerging Sci, PAK.
    Khan, Muhammad Gufran
    Natl Univ Comp & Emerging Sci, PAK.
    Lövström, Benny
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Shahid, Muhammad
    Natl Univ Comp & Emerging Sci, PAK.
    Development and Optimization of Deep Learning Models for Weapon Detection in Surveillance Videos2022In: Applied Sciences, E-ISSN 2076-3417, Vol. 12, no 12, article id 5772Article in journal (Refereed)
    Abstract [en]

    Featured Application This work has applied computer vision and deep learning technology to develop a real-time weapon detector system and tested it on different computing devices for large-scale deployment. Weapon detection in CCTV camera surveillance videos is a challenging task and its importance is increasing because of the availability and easy access of weapons in the market. This becomes a big problem when weapons go into the wrong hands and are often misused. Advances in computer vision and object detection are enabling us to detect weapons in live videos without human intervention and, in turn, intelligent decisions can be made to protect people from dangerous situations. In this article, we have developed and presented an improved real-time weapon detection system that shows a higher mean average precision (mAP) score and better inference time performance compared to the previously proposed approaches in the literature. Using a custom weapons dataset, we implemented a state-of-the-art Scaled-YOLOv4 model that resulted in a 92.1 mAP score and frames per second (FPS) of 85.7 on a high-performance GPU (RTX 2080TI). Furthermore, to achieve the benefits of lower latency, higher throughput, and improved privacy, we optimized our model for implementation on a popular edge-computing device (Jetson Nano GPU) with the TensorRT network optimizer. We have also performed a comparative analysis of the previous weapon detector with our presented model using different CPU and GPU machines that fulfill the purpose of this work, making the selection of model and computing device easier for the users for deployment in a real-time scenario. The analysis shows that our presented models result in improved mAP scores on high-performance GPUs (such as RTX 2080TI), as well as on low-cost edge computing GPUs (such as Jetson Nano) for weapon detection in live CCTV camera surveillance videos.

    Download full text (pdf)
    fulltext
  • 2.
    Benhamza, Hiba
    et al.
    Mohamed Khider University, DZA.
    Djeffal, Abdelhamid
    Mohamed Khider University, DZA.
    Cheddad, Abbas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Image forgery detection review2021In: Proceedings - 2021 International Conference on Information Systems and Advanced Technologies, ICISAT 2021, Institute of Electrical and Electronics Engineers Inc. , 2021Conference paper (Refereed)
    Abstract [en]

    With the wide spread of digital document use in administrations, fabrication and use of forged documents have become a serious problem. This paper presents a study and classification of the most important works on image and document forgery detection. The classification is based on documents type, forgery type, detection method, validation dataset, evaluation metrics and obtained results. Most of existing forgery detection works are dealing with images and few of them analyze administrative documents and go deeper to analyze their contents. © 2021 IEEE.

    Download full text (pdf)
    fulltext
  • 3.
    Bouhennache, Rafik
    et al.
    Science and technology institute, university center of Mila, DZA.
    Bouden, Toufik
    ohammed Seddik Ben Yahia University of Jijel, DZA.
    Taleb-Ahmed, Abdmalik
    university of V alenciennes, FRA.
    Cheddad, Abbas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering. Blekinge Institute of Technology.
    A new spectral index for the extraction of built-up land features from Landsat 8 satellite imagery2019In: Geocarto International, ISSN 1010-6049, E-ISSN 1752-0762, Vol. 34, no 14, p. 1531-1551Article in journal (Refereed)
    Abstract [en]

    Extracting built-up areas from remote sensing data like Landsat 8 satellite is a challenge. We have investigated it by proposing a new index referred as Built-up Land Features Extraction Index (BLFEI). The BLFEI index takes advantage of its simplicity and good separability between the four major component of urban system, namely built-up, barren, vegetation and water. The histogram overlap method and the Spectral Discrimination Index (SDI) are used to study separability. BLFEI index uses the two bands of infrared shortwaves, the red and green bands of the visible spectrum. OLI imagery of Algiers, Algeria, was used to extract built-up areas through BLFEI and some new previously developed built-up indices used for comparison. The water areas are masked out leading to Otsu’s thresholding algorithm to automatically find the optimal value for extracting built-up land from waterless regions. BLFEI, the new index improved the separability by 25% and the accuracy by 5%.

    Download full text (pdf)
    fulltext
  • 4.
    Cheddad, Abbas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    On Box-Cox Transformation for Image Normality and Pattern Classification2020In: IEEE Access, E-ISSN 2169-3536, Vol. 8, p. 154975-154983, article id 9174711Article in journal (Refereed)
    Abstract [en]

    A unique member of the power transformation family is known as the Box-Cox transformation. The latter can be seen as a mathematical operation that leads to finding the optimum lambda (λ) value that maximizes the log-likelihood function to transform a data to a normal distribution and to reduce heteroscedasticity. In data analytics, a normality assumption underlies a variety of statistical test models. This technique, however, is best known in statistical analysis to handle one-dimensional data. Herein, this paper revolves around the utility of such a tool as a pre-processing step to transform two-dimensional data, namely, digital images and to study its effect. Moreover, to reduce time complexity, it suffices to estimate the parameter lambda in real-time for large two-dimensional matrices by merely considering their probability density function as a statistical inference of the underlying data distribution. We compare the effect of this light-weight Box-Cox transformation with well-established state-of-the-art low light image enhancement techniques. We also demonstrate the effectiveness of our approach through several test-bed data sets for generic improvement of visual appearance of images and for ameliorating the performance of a colour pattern classification algorithm as an example application. Results with and without the proposed approach, are compared using the AlexNet (transfer deep learning) pretrained model. To the best of our knowledge, this is the first time that the Box-Cox transformation is extended to digital images by exploiting histogram transformation.

    Download full text (pdf)
    fulltext
  • 5.
    Cheddad, Abbas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Structure Preserving Binary Image Morphing using Delaunay Triangulation2017In: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 85, p. 8-14Article in journal (Refereed)
    Abstract [en]

    Mathematical morphology has been of a great significance to several scientific fields. Dilation, as one of the fundamental operations, has been very much reliant on the common methods based on the set theory and on using specific shaped structuring elements to morph binary blobs. We hypothesised that by performing morphological dilation while exploiting geometry relationship between dot patterns, one can gain some advantages. The Delaunay triangulation was our choice to examine the feasibility of such hypothesis due to its favourable geometric properties. We compared our proposed algorithm to existing methods and it becomes apparent that Delaunay based dilation has the potential to emerge as a powerful tool in preserving objects structure and elucidating the influence of noise. Additionally, defining a structuring element is no longer needed in the proposed method and the dilation is adaptive to the topology of the dot patterns. We assessed the property of object structure preservation by using common measurement metrics. We also demonstrated such property through handwritten digit classification using HOG descriptors extracted from dilated images of different approaches and trained using Support Vector Machines. The confusion matrix shows that our algorithm has the best accuracy estimate in 80% of the cases. In both experiments, our approach shows a consistent improved performance over other methods which advocates for the suitability of the proposed method.

  • 6. Danielsson, Max
    et al.
    Sievert, Thomas
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Rasmusson, Jim
    Sony Mobile Communications AB.
    Feature Detection and Description using a Harris-Hessian/FREAK Combination on an Embedded GPU2016Conference paper (Refereed)
    Abstract [en]

    GPUs in embedded platforms are reaching performance levels comparable to desktop hardware, thus it becomes interesting to apply Computer Vision techniques. We propose, implement, and evaluate a novel feature detector and descriptor combination, i.e., we combine the Harris-Hessian detector with the FREAK binary descriptor. The implementation is done in OpenCL, and we evaluate the execution time and classification performance. We compare our approach with two other methods, FAST/BRISK and ORB. Performance data is presented for the mobile device Xperia Z3 and the desktop Nvidia GTX 660. Our results indicate that the execution times on the Xperia Z3 are insufficient for real-time applications while desktop execution shows future potential. Classification performance of Harris-Hessian/FREAK indicates that the solution is sensitive to rotation, but superior in scale variant images.

  • 7.
    Dasari, Siva Krishna
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Cheddad, Abbas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Palmquist, Jonatan
    Gkn Aerospace Engine Systems Sweden, Process Engineering Department, SWE.
    Melt-Pool Defects Classification for Additive Manufactured Components in Aerospace Use-Case2020In: 2020 7th International Conference on Soft Computing and Machine Intelligence, ISCMI 2020, Institute of Electrical and Electronics Engineers (IEEE), 2020, p. 249-254, article id 9311555Conference paper (Refereed)
    Abstract [en]

    One of the crucial aspects of additive manufacturing is the monitoring of the welding process for quality assurance of components. A common way to analyse the welding process is through visual inspection of melt-pool images to identify possible defects in manufacturing. Recent literature studies showed the potential use of prediction models for defects classification to speed up the manual verification criteria since a huge data is generated from the additive manufacturing. Although a huge image data is available, the data needs to be labelled manually by experts which results in small sample datasets. Hence, to model small sample sizes and also to acquire the importance of parameters, we opted a traditional machine learning method, Random Forests (RF). For feature extraction, we opted for the Polar Transformation to explore its applicability using the melt-pool image dataset and a publicly available shape image dataset. The results show that RF models with Polar Transformation performed the best on our case study datasets and the second-best for the public dataset when compared to the Histogram of Oriented Gradients, HARALICK, XY-projections of an image, and Local Binary Patterns methods. As such, the Polar Transformation can be considered as a suitable compact shape descriptor. © 2020 IEEE.

    Download full text (pdf)
    fulltext
  • 8.
    Garro, Valeria
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Sundstedt, Veronica
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Pose and visual attention: Exploring the effects of 3D shape near-isometric deformations on gaze2020In: Computer Science Research Notes, Vaclav Skala Union Agency , 2020, Vol. 2020, no 2020, p. 153-160Conference paper (Refereed)
    Abstract [en]

    Recent research in 3D shape analysis focuses on the study of visual attention on rendered 3D shapes investigating the impact of different factors such as material, illumination, and camera movements. In this paper, we analyze how the pose of a deformable shape affects visual attention. We describe an eye-tracking experiment that studied the influence of different poses of non-rigid 3D shapes on visual attention. The subjects free-viewed a set of 3D shapes rendered in different poses and from different camera views. The fixation maps obtained by the aggregated gaze data were projected onto the 3D shapes and compared at vertex level. The results indicate an impact of the pose for some of the tested shapes and also that view variation influences visual attention. The qualitative analysis of the 3D fixation maps shows high visual focus on the facial regions regardless of the pose, coherent with previous works. The visual attention variation between poses appears to correspond to geometric salient features and semantically salient parts linked to the action represented by the pose. © 2020, Vaclav Skala Union Agency. All rights reserved.

    Download full text (pdf)
    Pose and visual attention
  • 9.
    Hallösta, Simon
    et al.
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Pettersson, Mats
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Dahl, Mattias
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Effects of Foreground Augmentations in Synthetic Training Data on the Use of UAVs for Weed Detection2024In: Proceedings of Machine Learning Research / [ed] Lutchyn T., Rivera A.R., Ricaud B., ML Research Press , 2024, Vol. 233Conference paper (Refereed)
    Abstract [en]

    This study addresses the issue of black-grass, a herbicide-resistant weed that threatens wheat yields in Western Europe, through the use of high- resolution Unmanned Aerial Vehicles (UAVs) and synthetic data augmentation in precision agriculture. We mitigate challenges such as the need for large labeled datasets and environmental variability by employing synthetic data augmentations in training a Mask R-CNN model. Using a minimal dataset of 43 black-grass and 12 wheat field images, we achieved a 37% increase in Area Under the Curve (AUC) over the non-augmented baseline, with scaling as the most effective augmentation. The best model attained a recall of 53% at a precision of 64%, offering a promising approach for future precision agriculture applications. © NLDL 2024. All rights reserved.

    Download full text (pdf)
    fulltext
  • 10.
    Hallösta, Simon
    et al.
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Pettersson, Mats
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Dahl, Mattias
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Impact of Neural Network Architecture for Fingerprint Recognition2024In: Intelligent Systems and Pattern Recognition: Third International Conference, ISPR 2023, Hammamet, Tunisia, May 11–13, 2023, Revised Selected Papers, Part I / [ed] Akram Bennour, Ahmed Bouridane, Lotfi Chaari, Springer, 2024, Vol. 1940, p. 3-14Conference paper (Refereed)
    Abstract [en]

    This work investigates the impact of the neural networks architecture when performing fingerprint recognition. Three networks are studied; a Triplet network and two Siamese networks. They are evaluated on datasets with specified amounts of relative translation between fingerprints. The results show that the Siamese model based on contrastive loss performed best in all evaluated metrics. Moreover, the results indicate that the network with a categorical scheme performed inferior to the other models, especially in recognizing images with high confidence. The Equal Error Rate (EER) of the best model ranged between 4%−11% which was on average 6.5 percentage points lower than the categorical schemed model. When increasing the translation between images, the networks were predominantly affected once the translation reached a fourth of the image. Our work concludes that architectures designed to cluster data have an advantage when designing an authentication system based on neural networks.

  • 11.
    Javadi, Mohammad Saleh
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Computer Vision Algorithms for Intelligent Transportation Systems Applications2018Licentiate thesis, comprehensive summary (Other academic)
    Abstract [en]

    In recent years, Intelligent Transportation Systems (ITS) have emerged as

    an efficient way of enhancing traffic flow, safety and management. These

    goals are realized by combining various technologies and analyzing the acquired

    data from vehicles and roadways. Among all ITS technologies, computer

    vision solutions have the advantages of high flexibility, easy maintenance

    and high price-performance ratio that make them very popular for

    transportation surveillance systems. However, computer vision solutions

    are demanding and challenging due to computational complexity, reliability,

    efficiency and accuracy among other aspects.

     

    In this thesis, three transportation surveillance systems based on computer

    vision are presented. These systems are able to interpret the image

    data and extract the information about the presence, speed and class of

    vehicles, respectively. The image data in these proposed systems are acquired

    using Unmanned Aerial Vehicle (UAV) as a non-stationary source

    and roadside camera as a stationary source. The goal of these works is to

    enhance the general performance of accuracy and robustness of the systems

    with variant illumination and traffic conditions.

     

    This is a compilation thesis in systems engineering consisting of three

    parts. The red thread through each part is a transportation surveillance

    system. The first part presents a change detection system using aerial images

    of a cargo port. The extracted information shows how the space is

    utilized at various times aiming for further management and development

    of the port. The proposed solution can be used at different viewpoints and

    illumination levels e.g. at sunset. The method is able to transform the images

    taken from different viewpoints and match them together. Thereafter,

    it detects discrepancies between the images using a proposed adaptive local

    threshold. In the second part, a video-based vehicle's speed estimation

    system is presented. The measured speeds are essential information for law

    enforcement and they also provide an estimation of traffic flow at certain

    points on the road. The system employs several intrusion lines to extract

    the movement pattern of each vehicle (non-equidistant sampling) as an input

    feature to the proposed analytical model. In addition, other parameters such as camera sampling rate and distances between intrusion lines are also

    taken into account to address the uncertainty in the measurements and to

    obtain the probability density function of the vehicle's speed. In the third

    part, a vehicle classification system is provided to categorize vehicles into

    \private car", \light trailer", \lorry or bus" and \heavy trailer". This information

    can be used by authorities for surveillance and development of

    the roads. The proposed system consists of multiple fuzzy c-means clusterings using input features of length, width and speed of each vehicle. The

    system has been constructed by using prior knowledge of traffic regulations

    regarding each class of vehicle in order to enhance the classification performance.

    Download full text (pdf)
    fulltext
  • 12.
    Javadi, Mohammad Saleh
    et al.
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Dahl, Mattias
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Pettersson, Mats
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Change detection in aerial images using three-dimensional feature maps2020In: Remote Sensing, E-ISSN 2072-4292, Vol. 12, no 9, article id 1404Article in journal (Refereed)
    Abstract [en]

    Interest in aerial image analysis has increased owing to recent developments in and availabilityofaerialimagingtechnologies,likeunmannedaerialvehicles(UAVs),aswellasagrowing need for autonomous surveillance systems. Variant illumination, intensity noise, and different viewpointsareamongthemainchallengestoovercomeinordertodeterminechangesinaerialimages. In this paper, we present a robust method for change detection in aerial images. To accomplish this, the method extracts three-dimensional (3D) features for segmentation of objects above a defined reference surface at each instant. The acquired 3D feature maps, with two measurements, are then used to determine changes in a scene over time. In addition, the important parameters that affect measurement, such as the camera’s sampling rate, image resolution, the height of the drone, and the pixel’sheightinformation,areinvestigatedthroughamathematicalmodel. Toexhibititsapplicability, the proposed method has been evaluated on aerial images of various real-world locations and the results are promising. The performance indicates the robustness of the method in addressing the problems of conventional change detection methods, such as intensity differences and shadows.

    Download full text (pdf)
    fulltext
  • 13.
    Javadi, Saleh
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Computer Vision for Traffic Surveillance Systems: Methods and Applications2021Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Computer vision solutions play a significant role in intelligent transportation systems (ITS) by improving traffic flow, safety and management. In addition, they feature prominently in autonomous vehicles and their future development. The main advantages of vision-based systems are their flexibility, coverage and accessibility. Moreover, computational power and recent algorithmic advances have increased the promise of computer vision solutions and broadened their implementation. However, computational complexity, reliability and efficiency remain among the challenges facing vision-based systems.

    Most traffic surveillance systems in ITS comprise three major criteria: vehicle detection, tracking and classification. In this thesis, computer vision systems are introduced to accomplish goals corresponding to these three criteria: 1) to detect the changed regions of an industrial harbour's parking lot using aerial images, 2) to estimate the speed of the vehicles on the road using a stationary roadside camera and 3) to classify vehicles using a stationary roadside camera and aerial images.

    The first part of this thesis discusses change detection in aerial images, which is the core of many remote sensing applications. The aerial images were taken over an industrial harbour using unmanned aerial vehicles on different days and under various circumstances. This thesis presents two approaches to detecting changed regions: a local pattern descriptor and three-dimensional feature maps. These methods are robust to varying illumination and shadows. Later, the introduced 3D feature map generation model was employed for vehicle detection in aerial images.

    The second part of this thesis deals with vehicle speed estimation using roadside cameras. Information regarding the flow, speed and number of vehicles is essential for traffic surveillance systems. In this thesis, two vision-based vehicle speed estimation approaches are proposed. These analytical models consider the measurement uncertainties related to the camera sampling time. The main contribution of these models is to estimate a speed probability density function for every vehicle. Later, the speed estimation model was utilised for vehicle classification using a roadside camera.

    Finally, in the third part, two vehicle classification models are proposed for roadside and aerial images. The first model utilises the proposed speed estimation method to extract the speed of the passing vehicles. Then, we used a fuzzy c-means algorithm to classify vehicles using their speeds and dimension features. The results show that vehicle speed is a useful feature for distinguishing different categories of vehicles. The second model employs deep neural networks to detect and classify heavy vehicles in aerial images. In addition, the proposed 3D feature generation model was utilised to improve the performance of the deep neural network. The experimental results show that 3D feature information can significantly reduce false positives in the deep learning model's output.

    This thesis comprises two chapters: Introduction, and Publications. In the introduction section, we discuss the motivation for computer vision solutions and their importance. Furthermore, the concepts and algorithms used to construct the proposed methods are explained. The second chapter presents the included publications.

    Download full text (pdf)
    fulltext
  • 14.
    Javadi, Saleh
    et al.
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Dahl, Mattias
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Pettersson, Mats
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Vehicle Detection in Aerial Images Based on 3D Depth Maps and Deep Neural Networks2021In: IEEE Access, E-ISSN 2169-3536, Vol. 9, p. 8381-8391Article in journal (Refereed)
    Abstract [en]

    Object detection in aerial images, particularly of vehicles, is highly important in remote sensing applications including traffic management, urban planning, parking space utilization, surveillance, and search and rescue. In this paper, we investigate the ability of three-dimensional (3D) feature maps to improve the performance of deep neural network (DNN) for vehicle detection. First, we propose a DNN based on YOLOv3 with various base networks, including DarkNet-53, SqueezeNet, MobileNet-v2, and DenseNet-201. We assessed the base networks and their performance in combination with YOLOv3 on efficiency, processing time, and the memory that each architecture required. In the second part, 3D depth maps were generated using pairs of aerial images and their parallax displacement. Next, a fully connected neural network (fcNN) was trained on 3D feature maps of trucks, semi-trailers and trailers. A cascade of these networks was then proposed to detect vehicles in aerial images. Upon the DNN detecting a region, coordinates and confidence levels were used to extract the corresponding 3D features. The fcNN used 3D features as the input to improve the DNN performance. The data set used in this work was acquired from numerous flights of an unmanned aerial vehicle (UAV) across two industrial harbors over two years. The experimental results show that 3D features improved the precision of DNNs from 88.23 % to 96.43 % and from 97.10 % to 100 % when using DNN confidence thresholds of 0.01 and 0.05, respectively. Accordingly, the proposed system was able to successfully remove 72.22 % to 100 % of false positives from the DNN outputs. These results indicate the importance of 3D features utilization to improve object detection in aerial images for future research. CCBY

    Download full text (pdf)
    fulltext
  • 15.
    Jerkenhag, Joakim
    Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.
    Comparing machine learning methods for classification and generation of footprints of buildings from aerial imagery2019Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    The up to date mapping data is of great importance in social services and disaster relief as well as in city planning. The vast amounts of data and the constant increase of geographical changes lead to large loads of continuous manual analysis. This thesis takes the process of updating maps and breaks it down to the problem of discovering buildings by comparing different machine learning methods to automate the finding of buildings. The chosen methods, YOLOv3 and Mask R-CNN, are based on Region Convolutional Neural Network(R-CNN) due to their capabilities of image analysis in both speed and accuracy. The image data supplied by Lantmäteriet makes up the training and testing data; this data is then used by the chosen machine learning methods. The methods are trained at different time limits, the generated models are tested and the results analysed. The results lay ground for whether the model is reasonable to use in a fully or partly automated system for updating mapping data from aerial imagery. The tested methods showed volatile results through their first hour of training, with YOLOv3 being more so than Mask R-CNN. After the first hour and until the eight hour YOLOv3 shows a higher level of accuracy compared to Mask R-CNN. For YOLOv3, it seems that with more training, the recall increases while precision decreases. For Mask R-CNN, however, there is some trade-off between the recall and precision throughout the eight hours of training. While there is a 90 % confidence interval that the accuracy of YOLOv3 is decreasing for each hour of training after the first hour, the Mask R-CNN method shows that its accuracy is increasing for every hour of training,however, with a low confidence and can therefore not be scientifically relied upon. Due to differences in setups the image size varies between the methods, even though they train and test on the same areas; this results in a fair evaluation where YOLOv3 analyses one square kilometre 1.5 times faster than the Mask R-CNN method does. Both methods show potential for automated generation of footprints, however, the YOLOv3 method solely generates bounding boxes, leaving the step of polygonization to manual work while the Mask R-CNN does, as the name implies, create a mask of which the object is encapsulated. This extra step is thought to further automate the manual process and with viable results speed up the updating of map data.

    Download full text (pdf)
    fulltext
  • 16. Kapoor, Shrayash
    Point Cloud Data Augmentation for Safe 3D Object Detection using Geometric Techniques2021Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Background: Autonomous navigation has become increasingly popular. This surge in popularity caused a lot of interest in sensor technologies, driving the cost of sensor technology down. This has resulted in increasing developments in deep learning for computer vision. There is, however, not a lot of available, adaptable research for directly performing data augmentation on point cloud data independent of the training process. This thesis focuses on the impact of point cloud augmentation techniques on 3D object detection quality.

    Objectives: The objectives of this thesis are to evaluate the efficiency of geometric data augmentation techniques for point cloud data. The identified techniques are then implemented on a 3D object detector, and the results obtained are then compared based on selected metrics.

    Methods: This thesis uses two literature reviews to find the appropriate point cloud techniques to implement for data augmentation and a 3D object detector to implement data augmentation. Subsequently, an experiment is performed to quantitatively discern how much improvement augmentation offers in the detection quality. Metrics used to compare the algorithms include precision, recall, average precision, mean average precision, memory usage and training time.

    Results: The literature review results indicate flipping, scaling, translation and rotation to be ideal candidates for performing geometric data augmentation and ComplexYOLO to be a capable detector for 3D object detection. Experimental results indicate that at the expense of some training time, the developed library "Aug3D" can boost the detection quality and results of the ComplexYOLO algorithm.

    Conclusions: After analysis of results, it was found that the implementation of geometric data augmentations (namely flipping, translation, scaling and rotation) yielded an increase of over 50% in the mean average precision for the performance of the ComplexYOLO 3D detection model on the Car and Pedestrian classes. 

    Download full text (pdf)
    fulltext
  • 17.
    Kusetogullari, Hüseyin
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science. Skövde University, SWE.
    Yavariabdi, Amir
    KTO Karatay University, TUR.
    Hall, Johan
    Arkiv Digital, SWE.
    Lavesson, Niklas
    Jönköping University, SWE.
    DIGITNET: A Deep Handwritten Digit Detection and Recognition Method Using a New Historical Handwritten Digit Dataset2021In: Big Data Research, ISSN 2214-5796, E-ISSN 2214-580X, Vol. 23, article id 100182Article in journal (Refereed)
    Abstract [en]

    This paper introduces a novel deep learning architecture, named DIGITNET, and a large-scale digit dataset, named DIDA, to detect and recognize handwritten digits in historical document images written in the nineteen century. To generate the DIDA dataset, digit images are collected from 100,000 Swedish handwritten historical document images, which were written by different priests with different handwriting styles. This dataset contains three sub-datasets including single digit, large-scale bounding box annotated multi-digit, and digit string with 250,000, 25,000, and 200,000 samples in Red-Green-Blue (RGB) color spaces, respectively. Moreover, DIDA is used to train the DIGITNET network, which consists of two deep learning architectures, called DIGITNET-dect and DIGITNET-rec, respectively, to isolate digits and recognize digit strings in historical handwritten documents. In DIGITNET-dect architecture, to extract features from digits, three residual units where each residual unit has three convolution neural network structures are used and then a detection strategy based on You Look Only Once (YOLO) algorithm is employed to detect handwritten digits at two different scales. In DIGITNET-rec, the detected isolated digits are passed through 3 different designed Convolutional Neural Network (CNN) architectures and then the classification results of three different CNNs are combined using a voting scheme to recognize digit strings. The proposed model is also trained with various existing handwritten digit datasets and then validated over historical handwritten digit strings. The experimental results show that the proposed architecture trained with DIDA (publicly available from: https://didadataset.github.io/DIDA/) outperforms the state-of-the-art methods. © 2020 The Author(s)

    Download full text (pdf)
    fulltext
  • 18.
    Lekamlage, Charitha Dissanayake
    et al.
    Blekinge Institute of Technology. student.
    Afzal, Fabia
    Blekinge Institute of Technology. student.
    Westerberg, Erik
    Blekinge Institute of Technology. student.
    Cheddad, Abbas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Mini-DDSM: Mammography-based Automatic Age Estimation2020In: ACM International Conference Proceeding Series, Association for Computing Machinery , 2020, p. 1-6, article id 3441370Conference paper (Refereed)
    Abstract [en]

    Age estimation has attracted attention for its various medical applications. There are many studies on human age estimation from biomedical images. However, there is no research done on mammograms for age estimation, as far as we know. The purpose of this study is to devise an AI-based model for estimating age from mammogram images. Due to lack of public mammography data sets that have the age attribute, we resort to using a web crawler to download thumbnail mammographic images and their age fields from the public data set; the Digital Database for Screening Mammography. The original images in this data set unfortunately can only be retrieved by a software which is broken. Subsequently, we extracted deep learning features from the collected data set, by which we built a model using Random Forests regressor to estimate the age automatically. The performance assessment was measured using the mean absolute error values. The average error value out of 10 tests on random selection of samples was around 8 years. In this paper, we show the merits of this approach to fill up missing age values. We ran logistic and linear regression models on another independent data set to further validate the advantage of our proposed work. This paper also introduces the free-access Mini-DDSM data set. © 2020 ACM.

    Download full text (pdf)
    fulltext
  • 19.
    Liang, Xusheng
    et al.
    Blekinge Institute of Technology. student.
    Cheddad, Abbas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Hall, Johan
    ArkivDigital AB, SWE.
    Comparative Study of Layout Analysis of Tabulated Historical Documents2021In: Big Data Research, ISSN 2214-5796, E-ISSN 2214-580X, Vol. 24, article id 100195Article in journal (Refereed)
    Abstract [en]

    Nowadays, the field of multimedia retrieval system has earned a lot of attention as it helps retrieve information more efficiently and accelerates daily tasks. Within this context, image processing techniques such as layout analysis and word recognition play an important role in transcribing content in printed or handwritten documents into digital data that can be further processed. This transcription procedure is called document digitization. This work stems from an industrial need, namely, a Swedish company (ArkivDigital AB) has scanned more than 80 million pages of Swedish historical documents from all over the country and there is a high demand to transcribe the contents into digital data. Such process starts by figuring out text location which, seen from another angle, is merely table layout analysis. In this study, the aim is to reveal the most effective solution to extract document layout w.r.t Swedish handwritten historical documents that are featured by their tabular forms. In short, outcome of public tools (i.e., Breuel's OCRopus method), traditional image processing techniques (e.g., Hessian/Gabor filters, Hough transform, Histograms of oriented gradients -HOG- features), machine learning techniques (e.g., support vector machines, transfer learning) are studied and compared. Results show that the existing OCR tool cannot carry layout analysis task on our Swedish historical handwritten documents. Traditional image processing techniques are mildly capable of extracting the general table layout in these documents, but the accuracy is enhanced by introducing machine learning techniques. The best performing approach will be used in our future document mining research to allow for the development of scalable resource-efficient systems for big data analytics. © 2021 Elsevier Inc.

  • 20.
    Lun, Zhao
    et al.
    Shenzhen Polytechnic, China.
    Pan, Yunlong
    Shenzhen Polytechnic, China.
    Wang, Sen
    Kunming University of Science and Technology, China.
    Abbas, Zeshan
    Shenzhen Polytechnic, China.
    Islam, Md. Shafiqul
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mechanical Engineering.
    Yin, Sufeng
    Guangdong Songshan Polytechnic, China.
    Skip-YOLO: Domestic Garbage Detection Using Deep Learning Method in Complex Multi-scenes2023In: International Journal of Computational Intelligence Systems, ISSN 1875-6891, E-ISSN 1875-6883, Vol. 16, no 1, article id 139Article in journal (Refereed)
    Abstract [en]

    It is of great significance to identify all types of domestic garbage quickly and intelligently to improve people's quality of life. Based on the visual analysis of feature map changes in different neural networks, a Skip-YOLO model is proposed for real-life garbage detection, targeting the problem of recognizing garbage with similar features. First, the receptive field of the model is enlarged through the large-size convolution kernel which enhanced the shallow information of images. Second, the high-dimensional features of the garbage maps are extracted by dense convolutional blocks. The sensitivity of similar features in the same type of garbage increases by strengthening the sharing of shallow low semantics and deep high semantics information. Finally, multiscale high-dimensional feature maps are integrated and routed to the YOLO layer for predicting garbage type and location. The overall detection accuracy is increased by 22.5% and the average recall rate is increased by 18.6% comparing the experimental results with the YOLOv3 analysis. In qualitative comparison, it successfully detects domestic garbage in complex multi-scenes. In addition, this approach alleviates the overfitting problem of deep residual blocks. The application case of waste sorting production line is used to further highlight the model generalization performance of the method. © 2023, Springer Nature B.V.

    Download full text (pdf)
    fulltext
  • 21.
    Mhathesh, T. S. R.
    et al.
    Karunya Institute of Technology and Sciences, IND.
    Andrew, J.
    Karunya Institute of Technology and Sciences, IND.
    Martin Sagayam, K.
    Karunya Institute of Technology and Sciences, IND.
    Henesey, Lawrence
    A 3d convolutional neural network for bacterial image classification2021In: Advances in Intelligent Systems and Computing / [ed] Peter J.D.,Fernandes S.L.,Alavi A.H.,Alavi A.H., Springer , 2021, Vol. 1167, p. 419-431Conference paper (Refereed)
    Abstract [en]

    Identification and analysis of biological microscopy images need high focus and years of experience to master the art. The rise of deep neural networks enables analyst to achieve the desired results with reduced time and cost. Light sheet fluorescence microscopies are one of the types of 3D microcopy images. Processing microscopy images is tedious process as it consists of low-level features. It is necessary to use proper image processing techniques to extract the low-level features of the biological microscopy images. Deep neural networks (DNN) are efficient in extracting the features of images and able to classify with high accuracy. Convolutional neural networks (CNN) are one of the types of neural networks that can provide promising results with less error rates. The ability of CNN to extract the low-level features of images makes it popular for image classification. In this paper, a CNN-based 3D bacterial image classification is proposed. 3D images contain more in-depth features than 2D images. The proposed CNN model is trained on 3D light sheet fluorescence microscopy images of larval zebrafish. The proposed CNN model classifies the bacterial and non-bacterial images effectively. Intense experimental analyses are carried out to find the optimal complexity and to get better classification accuracy. The proposed model provides better results than human comprehension and other traditional machine learning approaches like random forest, support vector classifier, etc. The details of network architecture, regularization, and hyperparameter optimization techniques are also presented. © Springer Nature Singapore Pte Ltd 2021.

  • 22.
    Moghimi, Armin
    et al.
    K N Toosi University, IRN.
    Celik, Turgay
    University of the Witwatersrand, ZAF.
    Mohammadzadeh, Ali
    K.N.Toosi University of Technology, IRN.
    Kusetogullari, Hüseyin
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Comparison of Keypoint Detectors and Descriptors for Relative Radiometric Normalization of Bitemporal Remote Sensing Images2021In: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, ISSN 1939-1404, E-ISSN 2151-1535, no 4, p. 4063-4073Article in journal (Refereed)
    Abstract [en]

    This paper compares the performances of the most commonly used keypoint detectors and descriptors (SIFT, SURF, KAZE, AKAZE, ORB, and BRISK) for Relative Radiometric Normalization (RRN) of unregistered bitemporal multi-spectral images. The keypoints matched between subject and reference images represent possible unchanged regions and are used in forming a Radiometric Control Set (RCS). The initial RCS is further refined by removing the matched keypoints with a low cross-correlation. The final RCS is used to approximate a linear mapping between the corresponding bands of the subject and reference images. This procedure is validated on five datasets of unregistered multi-spectral image pairs acquired by inter/intra sensors in terms of RRN accuracy, visual quality, quality and quantity of the samples in the RCS, and computing time. The experimental results show that keypoint-based RRN is robust against variations in spatial-resolution, illumination, and sensors. The blob detectors (SURF, SIFT, KAZE, and AKAZE) are more accurate on average than the corner detectors (ORB and BRISK) in RRN. However, they are slower in computing. The source code and datasets used in experiments are available at https://github.com/ArminMoghimi/keypoint-based-RRN to support reproducible research in remote sensing. CCBY

    Download full text (pdf)
    fulltext
  • 23.
    Moghimi, Armin
    et al.
    K. N. Toosi University of Technology, IRN.
    Sarmadian, Amin
    K. N. Toosi University of Technology, IRN.
    Mohammadzadeh, Ali
    K. N. Toosi University of Technology, IRN.
    Celik, Turgay
    University of the Witwatersrand, ZAF.
    Amani, Meisam
    Wood Environment and Infrastructure Solutions, CAN.
    Kusetogullari, Hüseyin
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Distortion Robust Relative Radiometric Normalization of Multitemporal and Multisensor Remote Sensing Images Using Image Features2022In: IEEE Transactions on Geoscience and Remote Sensing, ISSN 0196-2892, E-ISSN 1558-0644, Vol. 60, article id 5400820Article in journal (Refereed)
    Abstract [en]

    In this article, we propose a novel framework to radiometrically correct unregistered multisensor image pairs based on the extracted feature points with the KAZE detector and the conditional probability (CP) process in the linear model fitting. In this method, the scale, rotation, and illumination invariant radiometric control set samples (SRII-RCSS) are first extracted by the blockwise KAZE strategy. They are then distributed uniformly over both textured and texture-less land use/land cover (LULC) using grid interpolation and a set of nearest-neighbors. Subsequently, SRII-RCSS are scored by a similarity measure, and the histogram of the scores is then used to refine SRII-RCSS. The normalized subject image is produced by adjusting the subject image to the reference image using the CP-based linear regression (CPLR) based on the optimal SRII-RCSS. The registered normalized image is finally generated by registration of the normalized subject image to the reference image through a two-pass registration method, namely affine-B-spline and, then, it is enhanced by updating the normalization coefficient of CPLR based on the SRII-RCSS. In this study, eight multitemporal data sets acquired by inter/intra satellite sensors were used in tests to comprehensively assess the efficiency of the proposed method. Experimental results show that the proposed method outperforms the existing state-of-the-art relative radiometric normalization (RRN) methods both qualitatively and quantitatively, indicating its capability for RRN of unregistered multisensor image pairs. IEEE

  • 24.
    Rutkowska, Danuta
    et al.
    Information Technology Institute, POL.
    Kurach, Damian
    Czestochowa University of Technology, POL.
    Rakus-Andersson, Elisabeth
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Face Recognition with Explanation by Fuzzy Rules and Linguistic Description2020In: Lecture Notes in Computer Science / [ed] Rutkowski L.,Scherer R.,Korytkowski M.,Pedrycz W.,Tadeusiewicz R.,Zurada J.M., Springer Science and Business Media Deutschland GmbH , 2020, Vol. 12415, p. 338-350Conference paper (Refereed)
    Abstract [en]

    In this paper, a new approach to face recognition is proposed. The knowledge represented by fuzzy IF-THEN rules, with type-1 and type-2 fuzzy sets, are employed in order to generate the linguistic description of human faces in digital pictures. Then, an image recognition system can recognize and retrieve a picture (image of a face) or classify face images based on the linguistic description. Such a system is explainable – it can explain its decision based on the fuzzy rules. © 2020, Springer Nature Switzerland AG.

  • 25.
    Rutkowska, Danuta
    et al.
    University of Social Sciences, POL.
    Kurach, Damian
    Czestochowa University of Technology, POL.
    Rakus-Andersson, Elisabeth
    Blekinge Institute of Technology, Faculty of Engineering, Department of Mathematics and Natural Sciences.
    Fuzzy Granulation Approach to Face Recognition2021In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) / [ed] Rutkowski L., Scherer R., Korytkowski M., Pedrycz W., Tadeusiewicz R., Zurada J.M., Springer Science and Business Media Deutschland GmbH , 2021, Vol. 12855, p. 495-510Conference paper (Refereed)
    Abstract [en]

    In this paper, a new approach to face description is proposed. The linguistic description of human faces in digital pictures is generated within a framework of fuzzy granulation. Fuzzy relations and fuzzy relational rules are applied in order to create the image description. By use of type-2 fuzzy sets, fuzzy relations, and fuzzy IF-THEN rules, an image recognition system can infer and explain its decision. Such a system can retrieve an image, recognize, and classify – especially a human face – based on the linguistic description. © 2021, Springer Nature Switzerland AG.

  • 26.
    Shehu, Harisu Abdullahi
    et al.
    Victoria University of Wellington, NZL.
    Sharif, Md. Haidar
    University of Hail, SAU.
    Sharif, Md. Haris Uddin
    University of the Cumberlands, USA.
    Datta, Ripon
    University of the Cumberlands, USA.
    Tokat, Sezai
    Pamukkale University, TUR.
    Uyaver, Sahin
    Turkish-German University, TUR.
    Kusetogullari, Hüseyin
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Ramadan, Rabie A.
    Cairo University, EGY.
    Deep Sentiment Analysis: A Case Study on Stemmed Turkish Twitter Data2021In: IEEE Access, E-ISSN 2169-3536, Vol. 9, p. 56836-56854Article in journal (Refereed)
    Abstract [en]

    Sentiment analysis using stemmed Twitter data from various languages is an emerging research topic. In this paper, we address three data augmentation techniques namely Shift, Shuffle, and Hybrid to increase the size of the training data; and then we use three key types of deep learning (DL) models namely recurrent neural network (RNN), convolution neural network (CNN), and hierarchical attention network (HAN) to classify the stemmed Turkish Twitter data for sentiment analysis. The performance of these DL models has been compared with the existing traditional machine learning (TML) models. The performance of TML models has been affected negatively by the stemmed data, but the performance of DL models has been improved greatly with the utilization of the augmentation techniques. Based on the simulation, experimental, and statistical results analysis deeming identical datasets, it has been concluded that the TML models outperform the DL models with respect to both training-time (TTM) and runtime (RTM) complexities of the algorithms; but the DL models outperform the TML models with respect to the most important performance factors as well as the average performance rankings. CCBY

    Download full text (pdf)
    fulltext
  • 27.
    Teki, Sai Ajith
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Optimal Optimizer Hyper-Parameters for 2D to 3D Reconstruction2021Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    2D to 3D reconstruction is an ill-posed problem in the field of Autonomous Robot Navigation. Many practitioners are tend to utilize the enormous success of Deep Learning techniques like CNN, ANN etc to solve tasks related to this 2D to 3D reconstruction. Generally, every deep learning model involves implementation of different optimizers related to the tasks to lower the possible negativity in its results and selection of hyper parameter values for these optimizers during the process of training the model with required dataset.Selection of this optimizer hyper-parameters requires in-depth knowledge and trials and errors. So proposing optimal hyper parameters for optimizers results in no waste in computational resources and time.Hence solution for the selected task cab found easily.

    The main objective of this research is to propose optimal hyper parameter values of various deep learning optimizers related to 2D to 3D reconstruction and proposing best optimizer among them in terms of computational time and resources

    To achieve the goal of this study two research methods are used in our work. The first one is a Systematic Literature Review; whose main goal is to reveal the widely selected and used optimizers for 2D to 3D reconstruction model using 3D Deep Learning techniques.The second, an experimental methodology is deployed, whose main goal is to propose the optimal hyper parameter values for respective optimizers like Adam, SGD+Momentum, Adagrad, Adadelta and Adamax which are used in 3D reconstruction models.

    In case of the computational time, Adamax optimizer outperformed all other optimizers used with training time (1970min), testing time (3360 min), evaluation-1 (16 min) and evaluation-2 (14 min).In case of Average Point cloud points, Adamax outperformed all other optimizers used with Mean value of 28451.04.In case of pred->GT and GT->pred values , Adamax optimizer outperformed all other optimizers with mean values of 4.742 and 4.600 respectively.

    Point Cloud Images with respective dense cloud points are obtained as results of our experiment.From the above results,Adamax optimizer is proved to be best in terms of visualization of Point Cloud images with optimal hyper parameter values as below:Epochs : 1000    Learning Rate : 1e-2    Chunk size : 32    Batch size : 32.

     In this study,'Adamax' optimizer with optimal hyper para meter values and better Point Cloud Image is proven to be the best optimizer that can be used in a 2D to 3D reconstruction related task that deals with Point Cloud images

    Download full text (pdf)
    Optimal Optimizer Hyper-parameters for 2D to 3D Reconstruction
  • 28.
    Turesson, Eric
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Multi-camera Computer Vision for Object Tracking: A comparative study2021Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Background: Video surveillance is a growing area where it can help with deterring crime, support investigation or to help gather statistics. These are just some areas where video surveillance can aid society. However, there is an improvement that could increase the efficiency of video surveillance by introducing tracking. More specifically, tracking between cameras in a network. Automating this process could reduce the need for humans to monitor and review since the tracking can track and inform the relevant people on its own. This has a wide array of usability areas, such as forensic investigation, crime alerting, or tracking down people who have disappeared.

    Objectives: What we want to investigate is the common setup of real-time multi-target multi-camera tracking (MTMCT) systems. Next up, we want to investigate how the components in an MTMCT system affect each other and the complete system. Lastly, we want to see how image enhancement can affect the MTMCT.

    Methods: To achieve our objectives, we have conducted a systematic literature review to gather information. Using the information, we implemented an MTMCT system where we evaluated the components to see how they interact in the complete system. Lastly, we implemented two image enhancement techniques to see how they affect the MTMCT.

    Results: As we have discovered, most often, MTMCT is constructed using a detection for discovering object, tracking to keep track of the objects in a single camera and a re-identification method to ensure that objects across cameras have the same ID. The different components have quite a considerable effect on each other where they can sabotage and improve each other. An example could be that the quality of the bounding boxes affect the data which re-identification can extract. We discovered that the image enhancement we used did not introduce any significant improvement.

    Conclusions: The most common structure for MTMCT are detection, tracking and re-identification. From our finding, we can see that all the component affect each other, but re-identification is the one that is mostly affected by the other components and the image enhancement. The two tested image enhancement techniques could not introduce enough improvement, but other image enhancement could be used to make the MTMCT perform better. The MTMCT system we constructed did not manage to reach real-time.

    Download full text (pdf)
    fulltext
  • 29.
    Wen, Wei
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Communication Systems.
    Khatibi, Siamak
    Blekinge Institute of Technology, Faculty of Computing, Department of Communication Systems.
    Towards Measuring of Depth Perception from Monocular Shadow Technique with Application in a Classical Painting2016In: Journal of Computers, ISSN 1796-203X, Vol. 11, p. 310-319Article in journal (Refereed)
    Abstract [en]

    Depth perception is one of important abilities of the human visual system to perceive the three dimensional world. Shadow technique that offers different depth information from different viewing points, known as Da Vinci stereopsis, has been used in classical paintings. In this paper, we report a method towards measuring the relative depth information stimulated by Da Vinci stereopsis in a classical painting. We set up a positioning array of cameras for capturing images from the portrait using a high resolution camera, where the changes of shadow areas are measured by featuring the effects as point and line changes. The result shows that 3D effects of the classical painting are not only a perceptual phenomenon but they are also physically tangible and can be measured. We confirm validity of the method by its implementation even on a typical single image and comparison of results between the single image and the portrait.

    Download full text (pdf)
    ICCEE2015
  • 30.
    Wen, Wei
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Technology and Aesthetics.
    Siamak, Khatibi
    Blekinge Institute of Technology, Faculty of Computing, Department of Technology and Aesthetics.
    Estimation of Image Sensor Fill Factor Using a Single Arbitrary Image2017In: Sensors, E-ISSN 1424-8220, Vol. 17, no 3, p. 620-Article in journal (Refereed)
    Abstract [en]

    Achieving a high fill factor is a bottleneck problem for capturing high-quality images. There are hardware and software solutions to overcome this problem. In the solutions, the fill factor is known. However, this is an industrial secrecy by most image sensor manufacturers due to its direct effect on the assessment of the sensor quality. In this paper, we propose a method to estimate the fill factor of a camera sensor from an arbitrary single image. The virtual response function of the imaging process and sensor irradiance are estimated from the generation of virtual images. Then the global intensity values of the virtual images are obtained, which are the result of fusing the virtual images into a single, high dynamic range radiance map. A non-linear function is inferred from the original and global intensity values of the virtual images. The fill factor is estimated by the conditional minimum of the inferred function. The method is verified using images of two datasets. The results show that our method estimates the fill factor correctly with significant stability and accuracy from one single arbitrary image according to the low standard deviation of the estimated fill factors from each of images and for each camera.

    Download full text (pdf)
    fulltext
  • 31.
    Westphal, Florian
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Data and Time Efficient Historical Document Analysis2020Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Over the last decades companies and government institutions have gathered vast collections of images of historical handwritten documents. In order to make these collections truly useful to the broader public, images suffering from degradations, such as faded ink, bleed through or stains, need to be made readable and the collections as a whole need to be made searchable. Readability can be achieved by separating text foreground from page background using document image binarization, while searchability by search string or by example image can be achieved through word spotting. Developing algorithms with reasonable binarization or word spotting performance is a difficult task. Additional challenges are to make these algorithms execute fast enough to process vast collections of images in a reasonable amount of time, and to enable them to learn from few labeled training samples. In this thesis, we explore heterogeneous computing, parameter prediction, and enhanced throughput as ways to reduce the execution time of document image binarization algorithms. We find that parameter prediction and mapping a heuristics based binarization algorithm to the GPU lead to an 1.7 and 3.5 increase in execution performance respectively. Furthermore, we identify for a learning based binarization algorithm using recurrent neural networks the number of pixels processed at once as way to trade off execution time with binarization quality. The achieved increase in throughput results in a 3.8 times faster overall execution time. Additionally, we explore guided machine learning (gML) as a possible approach to reduce the required amount of training data for learning based algorithms for binarization, character recognition and word spotting. We propose an initial gML system for binarization, which allows a user to improve an algorithm’s binarization quality by selecting suitable training samples. Based on this system, we identify and pursue three different directions, viz., formulation of a clear definition of gML, identification of an efficient knowledge transfer mechanism from user to learner, and automation of sample selection. We explore the Learning Using Privileged Information paradigm as a possible knowledge transfer mechanism by using character graphs as privileged information for training a neural network based character recognizer. Furthermore, we show that, given a suitable word image representation, automatic sample selection can help to reduce the amount of training data required for word spotting by up to 69%.

    Download full text (pdf)
    fulltext
  • 32.
    Westphal, Florian
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Efficient Document Image Binarization using Heterogeneous Computing and Interactive Machine Learning2018Licentiate thesis, comprehensive summary (Other academic)
    Abstract [en]

    Large collections of historical document images have been collected by companies and government institutions for decades. More recently, these collections have been made available to a larger public via the Internet. However, to make accessing them truly useful, the contained images need to be made readable and searchable. One step in that direction is document image binarization, the separation of text foreground from page background. This separation makes the text shown in the document images easier to process by humans and other image processing algorithms alike. While reasonably well working binarization algorithms exist, it is not sufficient to just being able to perform the separation of foreground and background well. This separation also has to be achieved in an efficient manner, in terms of execution time, but also in terms of training data used by machine learning based methods. This is necessary to make binarization not only theoretically possible, but also practically viable.

    In this thesis, we explore different ways to achieve efficient binarization in terms of execution time by improving the implementation and the algorithm of a state-of-the-art binarization method. We find that parameter prediction, as well as mapping the algorithm onto the graphics processing unit (GPU) help to improve its execution performance. Furthermore, we propose a binarization algorithm based on recurrent neural networks and evaluate the choice of its design parameters with respect to their impact on execution time and binarization quality. Here, we identify a trade-off between binarization quality and execution performance based on the algorithm’s footprint size and show that dynamically weighted training loss tends to improve the binarization quality. Lastly, we address the problem of training data efficiency by evaluating the use of interactive machine learning for reducing the required amount of training data for our recurrent neural network based method. We show that user feedback can help to achieve better binarization quality with less training data and that visualized uncertainty helps to guide users to give more relevant feedback.

    Download full text (pdf)
    fulltext
  • 33.
    Westphal, Florian
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Lavesson, Niklas
    Jönköpings universitet.
    Representative Image Selection for Data Efficient Word Spotting2020In: Lecture Notes in Computer Science / [ed] Bai X.,Karatzas D.,Lopresti D., Springer, 2020, Vol. 12116, p. 383-397Conference paper (Refereed)
    Abstract [en]

    This paper compares three different word image representations as base for label free sample selection for word spotting in historical handwritten documents. These representations are a temporal pyramid representation based on pixel counts, a graph based representation, and a pyramidal histogram of characters (PHOC) representation predicted by a PHOCNet trained on synthetic data. We show that the PHOC representation can help to reduce the amount of required training samples by up to 69% depending on the dataset, if it is learned iteratively in an active learning like fashion. While this works for larger datasets containing about 1 700 images, for smaller datasets with 100 images, we find that the temporal pyramid and the graph representation perform better.

    Download full text (pdf)
    fulltext
  • 34.
    Westphal, Florian
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    User Feedback and Uncertainty in User Guided Binarization2018In: International Conference on Data Mining Workshops / [ed] Tong, H; Li, Z; Zhu, F; Yu, J, IEEE Computer Society, 2018, p. 403-410, article id 8637367Conference paper (Refereed)
    Abstract [en]

    In a child’s development, the child’s inherent ability to construct knowledge from new information is as important as explicit instructional guidance. Similarly, mechanisms to produce suitable learning representations, which can be trans- ferred and allow integration of new information are important for artificial learning systems. However, equally important are modes of instructional guidance, which allow the system to learn efficiently. Thus, the challenge for efficient learning is to identify suitable guidance strategies together with suitable learning mechanisms.

    In this paper, we propose guided machine learning as source for suitable guidance strategies, we distinguish be- tween sample selection based and privileged information based strategies and evaluate three sample selection based strategies on a simple transfer learning task. The evaluated strategies are random sample selection, i.e., supervised learning, user based sample selection based on readability, and user based sample selection based on readability and uncertainty. We show that sampling based on readability and uncertainty tends to produce better learning results than the other two strategies. Furthermore, we evaluate the use of the learner’s uncertainty for self directed learning and find that effects similar to the Dunning-Kruger effect prevent this use case. The learning task in this study is document image binarization, i.e., the separation of text foreground from page background and the source domain of the transfer are texts written on paper in Latin characters, while the target domain are texts written on palm leaves in Balinese script.

    Download full text (pdf)
    fulltext
  • 35.
    Westphal, Florian
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Document Image Binarization Using Recurrent Neural Networks2018In: Proceedings - 13th IAPR International Workshop on Document Analysis Systems, DAS 2018, IEEE, 2018, p. 263-268Conference paper (Refereed)
    Abstract [en]

    In the context of document image analysis, image binarization is an important preprocessing step for other document analysis algorithms, but also relevant on its own by improving the readability of images of historical documents. While historical document image binarization is challenging due to common image degradations, such as bleedthrough, faded ink or stains, achieving good binarization performance in a timely manner is a worthwhile goal to facilitate efficient information extraction from historical documents. In this paper, we propose a recurrent neural network based algorithm using Grid Long Short-Term Memory cells for image binarization, as well as a pseudo F-Measure based weighted loss function. We evaluate the binarization and execution performance of our algorithm for different choices of footprint size, scale factor and loss function. Our experiments show a significant trade-off between binarization time and quality for different footprint sizes. However, we see no statistically significant difference when using different scale factors and only limited differences for different loss functions. Lastly, we compare the binarization performance of our approach with the best performing algorithm in the 2016 handwritten document image binarization contest and show that both algorithms perform equally well.

    Download full text (pdf)
    fulltext
  • 36.
    Yang, Fan
    et al.
    Universiti Teknologi Malaysia (UTM), Malaysia.
    Ismail, Nor Azman
    Universiti Teknologi Malaysia (UTM), Malaysia.
    Pang, Yee Yong
    Universiti Teknologi Malaysia (UTM), Malaysia.
    Kebande, Victor R.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Al-Dhaqm, Arafat
    Universiti Teknologi PETRONAS, Malaysia.
    Koh, Tieng Wei
    Universiti Teknologi PETRONAS, Malaysia.
    A Systematic Literature Review of Deep Learning Approaches for Sketch-Based Image Retrieval: Datasets, Metrics, and Future Directions2024In: IEEE Access, E-ISSN 2169-3536, Vol. 12, p. 14847-14869Article, review/survey (Refereed)
    Abstract [en]

    Sketch-based image retrieval (SBIR) utilizes sketches to search for images containing similar objects or scenes. Due to the proliferation of touch-screen devices, sketching has become more accessible and therefore has received increasing attention. Deep learning has emerged as a potential tool for SBIR, allowing models to automatically extract image features and learn from large amounts of data. To the best of our knowledge, there is currently no systematic literature review (SLR) of SBIR with deep learning. Therefore, the aim of this review is to incorporate related works into a systematic study, highlighting the main contributions of individual researchers over the years, with a focus on past, present and future trends. To achieve the purpose of this study, 90 studies from 2016 to June 2023 in 4 databases were collected and analyzed using the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) framework. The specific models, datasets, evaluation metrics, and applications of deep learning in SBIR are discussed in detail. This study found that Convolutional Neural Networks (CNN) and Generative Adversarial Networks (GAN) are the most widely used deep learning methods for SBIR. A commonly used dataset is Sketchy, especially in the latest Zero-shot sketch-based image retrieval (ZS-SBIR) task. The results show that Mean Average Precision (mAP) is the most commonly used metric for quantitative evaluation of SBIR. Finally, we provide some future directions and guidance for researchers based on the results of this review. © 2013 IEEE.

    Download full text (pdf)
    fulltext
  • 37.
    Zhao, Mengqiao
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science. student.
    Hochuli, Andre Gustavo
    Pontifical Catholic University of Parana (PPGIa/PUCPR), BRA.
    Cheddad, Abbas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    End-to-End Approach for Recognition of Historical Digit Strings2021In: Lecture Notes in Computer Science / [ed] Lladós J., Lopresti D., Uchida S., Springer Science and Business Media Deutschland GmbH , 2021, p. 595-609Conference paper (Refereed)
    Abstract [en]

    The plethora of digitalised historical document datasets released in recent years has rekindled interest in advancing the field of handwriting pattern recognition. In the same vein, a recently published data set, known as ARDIS, presents handwritten digits manually cropped from 15.000 scanned documents of Swedish churches’ books that exhibit various handwriting styles. To this end, we propose an end-to-end segmentation- free deep learning approach to handle this challenging ancient handwriting style of dates present in the ARDIS dataset (4-digits long strings). We show that with slight modifications in the VGG-16 deep model, the framework can achieve a recognition rate of 93.2%, resulting in a feasible solution free of heuristic methods, segmentation, and fusion methods. Moreover, the proposed approach outperforms the well-known CRNN method (a model widely applied in handwriting recognition tasks). © 2021, Springer Nature Switzerland AG.

1 - 37 of 37
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf