Software quality is critical, as low quality, or 'Code smell,' increases technical debt and maintenance costs. There is a timely need for a collaborative model that detects and manages code smells by learning from diverse and distributed data sources while respecting privacy and providing a scalable solution for continuously integrating new patterns and practices in code quality management. However, the current literature is still missing such capabilities. This paper addresses the previous challenges by proposing a Federated Learning Code Smell Detection (FedCSD) approach, specifically targeting 'God Class,' to enable organizations to train distributed ML models while safeguarding data privacy collaboratively. We conduct experiments using manually validated datasets to detect and analyze code smell scenarios to validate our approach. Experiment 1, a centralized training experiment, revealed varying accuracies across datasets, with dataset two achieving the lowest accuracy (92.30%) and datasets one and three achieving the highest (98.90% and 99.5%, respectively). Experiment 2, focusing on cross-evaluation, showed a significant drop in accuracy (lowest: 63.80%) when fewer smells were present in the training dataset, reflecting technical debt. Experiment 3 involved splitting the dataset across 10 companies, resulting in a global model accuracy of 98.34%, comparable to the centralized model's highest accuracy. The application of federated ML techniques demonstrates promising performance improvements in code-smell detection, benefiting both software developers and researchers. © 2013 IEEE.
For reliable digital evidence to be admitted in a court of law, it is important to apply scientifically proven digital forensic investigation techniques to corroborate a suspected security incident. Mainly, traditional digital forensics techniques focus on computer desktops and servers. However, recent advances in digital media and platforms have seen an increased need for the application of digital forensic investigation techniques to other subdomains. This includes mobile devices, databases, networks, cloud-based platforms, and the Internet of Things (IoT) at large. To assist forensic investigators to conduct investigations within these subdomains, academic researchers have attempted to develop several investigative processes. However, many of these processes are domain-specific or describe domain-specific investigative tools. Hence, in this paper, we hypothesize that the literature is saturated with ambiguities. To further synthesize this hypothesis, a digital forensic model-orientated Systematic Literature Review (SLR) within the digital forensic subdomains has been undertaken. The purpose of this SLR is to identify the different and heterogeneous practices that have emerged within the specific digital forensics subdomains. A key finding from this review is that there are process redundancies and a high degree of ambiguity among investigative processes in the various subdomains. As a way forward, this study proposes a high-level abstract metamodel, which combines the common investigation processes, activities, techniques, and tasks for digital forensics subdomains. Using the proposed solution, an investigator can effectively organize the knowledge process for digital investigation.
Sleep is a period of rest that is essential for functional learning ability, mental health, and even the performance of normal activities. Insomnia, sleep apnea, and restless legs are all examples of sleep-related issues that are growing more widespread. When appropriately analyzed, the recording of bio-electric signals, such as the Electroencephalogram, can tell how well we sleep. Improved analyses are possible due to recent improvements in machine learning and feature extraction, and they are commonly referred to as automatic sleep analysis to distinguish them from sleep data analysis by a human sleep expert. This study outlines a Systematic Literature Review and the results it provided to assess the present state-of-the-art in automatic analysis of sleep data. A search string was organized according to the PICO (Population, Intervention, Comparison, and Outcome) strategy in order to determine what machine learning and feature extraction approaches are used to generate an Automatic Sleep Scoring System. The American Academy of Sleep Medicine and Rechtschaffen & Kales are the two main scoring standards used in contemporary research, according to the report. Other types of sensors, such as Electrooculography, are employed in addition to Electroencephalography to automatically score sleep. Furthermore, the existing research on parameter tuning for machine learning models that was examined proved to be incomplete. Based on our findings, different sleep scoring standards, as well as numerous feature extraction and machine learning algorithms with parameter tuning, have a high potential for developing a reliable and robust automatic sleep scoring system for supporting physicians. In the context of the sleep scoring problem, there are evident gaps that need to be investigated in terms of automatic feature engineering techniques and parameter tuning in machine learning algorithms.
This paper presents an iterative change detection (CD) method based on Bayes’ theorem for very high-frequency (VHF) ultra-wideband (UWB) SAR images considering commonly used clutter-plus-noise statistical models. The proposed detection technique uses the information of the detected changes to iteratively update the data and distribution information, obtaining more accurate clutter-plus-noise statistics resulting in false alarm reduction. The Bivariate Rayleigh and Bivariate Gaussian distributions are investigated as candidates to model the clutter-plus-noise, and the Anderson-Darling goodness-of-fit test is used to investigate three scenarios of interest. Different aspects related to the distributions are discussed, the observed mismatches are analyzed, and the impact of the distribution chosen for the proposed iterative change detection method is analyzed. Finally, the proposed iterative method performance is assessed in terms of the probability of detection and false alarm rate and compared with other competitive solutions. The experimental evaluation uses data from real measurements obtained using the CARABAS II SAR system. Results show that the proposed iterative CD algorithm performs better than the other methods. Author
Synthetic Aperture Radar (SAR) technology has unique advantages but faces challenges in obtaining enough data for noncooperative target classes. We propose a method to generate synthetic SAR data using a modified pix2pix Conditional Generative Adversarial Networks (cGAN) architecture. The cGAN is trained to create synthetic SAR images with specific azimuth and elevation angles, demonstrating its capability to closely mimic authentic SAR imagery through convergence and collapsing analyses. The study uses a model-based algorithm to assess the practicality of the generated synthetic data for Automatic Target Recognition (ATR). The results reveal that the classification accuracy achieved with synthetic data is comparable to that attained with original data, highlighting the effectiveness of the proposed method in mitigating the limitations imposed by noncooperative SAR data scarcity for ATR. This innovative approach offers a promising solution to craft customized synthetic SAR data, ultimately enhancing ATR performance in remote sensing.
This article presents a systematic literature review on hybrid recommendation systems (HRS) in the e-commerce sector, a field characterized by constant innovation and rapid growth. As the complexity and volume of digital data increases, recommendation systems have become essential in guiding customers to services or products that align with their interests. However, the effectiveness of single-architecture recommendation algorithms is often limited by issues such as data sparsity, challenges in understanding user needs, and the cold start problem. Hybridization, which combines multiple algorithms in different methods, has emerged as a dominant solution to these limitations. This approach is utilized in various domains, including e-commerce, where it significantly improves user experience and sales. To capture the recent trends and advancements in HRS within e-commerce over the past six years, we review the state-of-the-art overview of HRS within e-commerce. This review meticulously evaluates existing research, addressing primary inquiries and presenting findings that contribute to evidence-based decision-making, understanding research gaps, and maintaining transparency. The review begins by establishing fundamental concepts, followed by detailed methodologies, findings from addressing the research questions, and exploration of critical aspects of HRS. In summarizing and incorporating existing research, this paper offers valuable insights for researchers and outlines potential avenues for future research, ultimately providing a comprehensive overview of the current state and prospects of HRS in e-commerce.
A unique member of the power transformation family is known as the Box-Cox transformation. The latter can be seen as a mathematical operation that leads to finding the optimum lambda (λ) value that maximizes the log-likelihood function to transform a data to a normal distribution and to reduce heteroscedasticity. In data analytics, a normality assumption underlies a variety of statistical test models. This technique, however, is best known in statistical analysis to handle one-dimensional data. Herein, this paper revolves around the utility of such a tool as a pre-processing step to transform two-dimensional data, namely, digital images and to study its effect. Moreover, to reduce time complexity, it suffices to estimate the parameter lambda in real-time for large two-dimensional matrices by merely considering their probability density function as a statistical inference of the underlying data distribution. We compare the effect of this light-weight Box-Cox transformation with well-established state-of-the-art low light image enhancement techniques. We also demonstrate the effectiveness of our approach through several test-bed data sets for generic improvement of visual appearance of images and for ameliorating the performance of a colour pattern classification algorithm as an example application. Results with and without the proposed approach, are compared using the AlexNet (transfer deep learning) pretrained model. To the best of our knowledge, this is the first time that the Box-Cox transformation is extended to digital images by exploiting histogram transformation.
In this paper, two optimal power allocation strategies for hybrid interweave-underlay cognitive cooperative radio networks (CCRNs) are proposed to maximize channel capacity and minimize outage probability. The proposed power allocation strategies are derived for the case of Rayleigh fading taking into account the impact of imperfect spectrum sensing on the performance of the hybrid CCRN. Based on the optimal power allocation strategies, the transmit powers of the secondary transmitter and secondary relay are adapted according to the fading conditions, the interference power constraint imposed by the primary network (PN), the interference from the PN to the hybrid CCRN, and the total transmit power limit of the hybrid CCRN. Numerical results are provided to illustrate the effect of the interference power constraint of the PN, arrival rate of the PN, imperfect spectrum sensing, and the transmit power constraint of the hybrid CCRN on channel capacity and outage probability. Finally, comparisons of the channel capacity and outage probability of underlay, overlay, and hybrid interweaveunderlay CCRNs are presented to show the advantages of the hybrid spectrum access. OAPA
In this paper, we consider a two-way cognitive cooperative radio network (TW-CCRN) with hybrid interweaveunderlay spectrum access in the presence of imperfect spectrum sensing. Power allocation strategies are proposed that maximize the sum-rate and minimize the outage probability of the hybrid TW-CCRN. Specifically, based on the state of the primary network (PN), fading conditions, and system parameters, suitable power allocation strategies subject to the interference power constraint of the PN are derived for each transmission scenario of the hybrid TW-CCRN. Given the proposed power allocation strategies, we analyze the sum-rate and outage probability of the hybrid TW-CCRN over Rayleigh fading taking imperfect spectrum sensing into account. Numerical results are presented to illustrate the effect of the arrival rate, interference power threshold, transmit power of the PN, imperfect spectrum sensing, and maximum total transmit power on the sum-rate and outage probability of the hybrid TW-CCRN. OAPA
Exploratory testing (ET) is a powerful and efficient way of testing software by integrating design, execution, and analysis of tests during a testing session. ET is often contrasted with scripted testing, and seen as a choice of either exploratory testing or not. In contrast, we pose that exploratory testing can be of varying degrees of exploration from fully exploratory to fully scripted. In line with this, we propose a scale for the degree of exploration and define five levels. In our classification, these levels of exploration correspond to the way test charters are defined. We have evaluated this classification through focus groups at four companies and identified factors that influence the choice of exploration level. The results show that the proposed levels of exploration are influenced by different factors such as ease to reproduce defects, better learning, verification of requirements, etc., and that the levels can be used as a guide to structure test charters. Our study also indicates that applying a combination of exploration levels can be beneficial in achieving effective testing.
Background: The need for empirical investigations in software engineering is growing. Many researchers nowadays, conduct and validate their solutions using empirical research. The Survey is an empirical method which enables researchers to collect data from a large population. The main aim of the survey is to generalize the findings.
Aims: In this study, we aim to identify the problems researchers face during survey design and mitigation strategies.
Method: A literature review, as well as semi-structured interviews with nine software engineering researchers, were conducted to elicit their views on problems and mitigation strategies. The researchers are all focused on empirical software engineering.
Results: We identified 24 problems and 65 strategies, structured according to the survey research process. The most commonly discussed problem was sampling, in particular, the ability to obtain a sufficiently large sample. To improve survey instrument design, evaluation and execution recommendations for question formulation and survey pre-testing were given. The importance of involving multiple researchers in the analysis of survey results was stressed.
Conclusions: The elicited problems and strategies may serve researchers during the design of their studies. However, it was observed that some strategies were conflicting. This shows that it is important to conduct a trade-off analysis between strategies.
In this paper, we examine the sensitivity of the digital multimedia broadcasting (DMB) MPEG-2 transport stream (TS) format to transmission errors. To find the sensitivity of different parts of TS packets to transmission errors, each TS packet is divided into four cells, i.e., the first three cells comprising 48 bytes each and the last cell is of 44 bytes length. Bit errors are then introduced into these different parts of the TS packets. The sensitivity of DMB videos to transmission errors and their locations is assessed in terms of the following measures: 1) Number of decoder crashes; 2) Number of decodable videos; 3) Total number of decodable frames; and 4) Objective perceptual video quality of the decoded videos. The structural similarity index and visual information fidelity criterion are used as objective perceptual quality metrics. Simulations are performed on seven different DMB videos using various bit error rates. The results show that the first cell of the TS packets is highly sensitive to bit errors compared to the subsequent three cells, both in terms of spatial and temporal video quality. Further, the sensitivity decreases from Cell 1 to Cell 4 of a DMB TS packet. The error sensitivity analysis reported in this paper may guide the development of more reliable transmission systems for future DMB systems and services. Specifically, the insights gained from this study may support designing better error control schemes that take the sensitivity of different parts of DMB TS packets into consideration.
Object detection in aerial images, particularly of vehicles, is highly important in remote sensing applications including traffic management, urban planning, parking space utilization, surveillance, and search and rescue. In this paper, we investigate the ability of three-dimensional (3D) feature maps to improve the performance of deep neural network (DNN) for vehicle detection. First, we propose a DNN based on YOLOv3 with various base networks, including DarkNet-53, SqueezeNet, MobileNet-v2, and DenseNet-201. We assessed the base networks and their performance in combination with YOLOv3 on efficiency, processing time, and the memory that each architecture required. In the second part, 3D depth maps were generated using pairs of aerial images and their parallax displacement. Next, a fully connected neural network (fcNN) was trained on 3D feature maps of trucks, semi-trailers and trailers. A cascade of these networks was then proposed to detect vehicles in aerial images. Upon the DNN detecting a region, coordinates and confidence levels were used to extract the corresponding 3D features. The fcNN used 3D features as the input to improve the DNN performance. The data set used in this work was acquired from numerous flights of an unmanned aerial vehicle (UAV) across two industrial harbors over two years. The experimental results show that 3D features improved the precision of DNNs from 88.23 % to 96.43 % and from 97.10 % to 100 % when using DNN confidence thresholds of 0.01 and 0.05, respectively. Accordingly, the proposed system was able to successfully remove 72.22 % to 100 % of false positives from the DNN outputs. These results indicate the importance of 3D features utilization to improve object detection in aerial images for future research. CCBY
Chacha20 is a widely used stream cipher known for using permutation functions to enhance resistance against cryptanalysis. Although the existing literature highlights its strengths, it is worth further exploring its potential susceptibility to differential attacks. This paper proposes an Extended Chacha20 (EChacha20) stream cipher, which offers a slight improvement of Chacha20. It incorporates enhanced Quarter Round Functions QR-F with 32-bit input words and Add , Rotate , and XOR (ARX) operations on 16, 12, 8, 7, 4, and 2 constants. Using these improved QR-Fs , we expect EChacha20 to be more secure and effective against attacks than Chacha20. The threat model leveraged in this paper considers attacker assumptions based on the Bellare-Rogaway Model (B-RM) and the Chosen Plaintext Attack (CPA) to assess the potential security weaknesses. Then, the study analyzes the EChacha20 cipher using the NIST Statistical Test Suite (NSTS) and demonstrates its effectiveness against differential cryptanalysis. A differential attack addresses this challenge, where the study comprehensively analyses the differences between original and flipped bits. The NSTS has been used to statistically analyze the outcome for uniformity and evaluate the randomness of generating sequences of tests considering 1000 tests based on a range of [{0,1}]. Uniformity is evaluated based on the p-values test against a battery of passing sequences, and 100% is achieved from Runs and Serial (2): Test 1 , respectively. The performance evaluation metrics leveraged include encryption speed, decryption speed, and memory usage. Based on the test conducted, it has been observed that with increased QR-F , EChacha20 maintains a good balance in speed although slightly higher than Chacha20; however, with also slightly high memory usage compared to Chacha20. Despite that, a comparative study has been conducted against state-of-the-art studies, and the outcome has been reported to show the significance of the current study. Ultimately, the outcome indicates that the EChacha20 cipher has improved QR-F and security properties compared to Chacha20 and may provide a more robust encryption solution for various applications. © 2023 The Authors.
Assessing the security of IoT-based smart environments such as smart homes and smart citiesis becoming fundamentally essential to implementing the correct control measures and effectively reducingsecurity threats and risks brought about by deploying IoT-based smart technologies. The problem, however,is in finding security standards and assessment frameworks that best meets the security requirements as wellas comprehensively assesses and exposes the security posture of IoT-based smart environments. To explorethis gap, this paper presents a review of existing security standards and assessment frameworks which alsoincludes several NIST special publications on security techniques highlighting their primary areas of focusto uncover those that can potentially address some of the security needs of IoT-based smart environments.Cumulatively a total of 80 ISO/IEC security standards, 32 ETSI standards and 37 different conventionalsecurity assessment frameworks which included 7 NIST special publications on security techniques werereviewed. To present an all-inclusive and up-to-date state-of-the-art research, the review process consideredboth published security standards and assessment frameworks as well as those under development. Thefindings show that most of the conventional security standards and assessment frameworks do not directlyaddress the security needs of IoT-based smart environments but have the potential to be adapted intoIoT-based smart environments. With this insight into the state-of-the-art research on security standards andassessment frameworks, this study helps advance the IoT field by opening new research directions as wellas opportunities for developing new security standards and assessment frameworks that will address futureIoT-based smart environments security concerns. This paper also discusses open problems and challengesrelated to IoT-based smart environments security issues. As a new contribution, a taxonomy of challengesfor IoT-based smart environment security concerns drawn from the extensive literature examined during thisstudy is proposed in this paper which also maps the identified challenges to potential proposed solutions.
Chromatic aberration is an error that occurs in color images due to the fact that camera lenses refract the light of different wavelengths in different angles. The common approach today to correct the error is to use a lookup table for each camera-lens combination, e.g., as in Adobe PhotoShop Lightroom or DxO Optics Pro. In this paper, we propose a method that corrects the chromatic aberration error without any priot knowledge of the camera-lens combination, and does the correction already on the bayer data, i.e., before the Raw image data is interpolated to an RGB image. We evaluate our method in comparison to DxO Optics Pro, a state-of-the-art tool based on lookup tables, using 25 test images and the variance of the color differences (VCD) metric. The results show that our blind method has a similar error correction performance as DxO Optics Pro, but without prior knowledge of the camera-lens setup. CCBYNCND
The digital market trend is rapidly expanding due to key characteristics like decentralization, accessibility, and market diversity enabled by blockchain technology. This study proposes a Predictive Analytics System to provide simplified reporting for the three most popular cryptocurrencies with varying digits, namely ADA Cardano, Ethereum, and Binance coin, for ten days to contribute to this emerging technology. Thus, this proposed system employs a data science-based framework and six highly advanced data-driven Machine learning and Deep learning algorithms: Support Vector Regressor, Auto-Regressive Integrated Moving Average (ARIMA), Facebook Prophet, Unidirectional LSTM, Bidirectional LSTM, Stacked LSTM. Moreover, the research experiments are repeated several times to achieve the best results by employing hyperparameter tuning of each algorithm. This involves selecting an appropriate kernel and suitable data normalization technique for SVR, determining ARIMA's (p, d, q) values, and optimizing the loss function values, number of neurons, hidden layers, and epochs in LSTM models. For the model validation, we utilize widely used evaluation techniques: Mean Absolute Error, Root Mean Squared Error, Mean Absolute Percentage Error, and R-squared. Results demonstrate that ARIMA outperforms the other models in all cases, accurately projecting the price variability within the actual price range. Conversely, Facebook Prophet exhibits good performance to some extent. The paper suggests that the ARIMA technique offers practical implications for market analysts, enabling them to make well-informed decisions based on accurate price projections. © 2013 IEEE.
Sentiment analysis using stemmed Twitter data from various languages is an emerging research topic. In this paper, we address three data augmentation techniques namely Shift, Shuffle, and Hybrid to increase the size of the training data; and then we use three key types of deep learning (DL) models namely recurrent neural network (RNN), convolution neural network (CNN), and hierarchical attention network (HAN) to classify the stemmed Turkish Twitter data for sentiment analysis. The performance of these DL models has been compared with the existing traditional machine learning (TML) models. The performance of TML models has been affected negatively by the stemmed data, but the performance of DL models has been improved greatly with the utilization of the augmentation techniques. Based on the simulation, experimental, and statistical results analysis deeming identical datasets, it has been concluded that the TML models outperform the DL models with respect to both training-time (TTM) and runtime (RTM) complexities of the algorithms; but the DL models outperform the TML models with respect to the most important performance factors as well as the average performance rankings. CCBY
Opinion mining has witnessed significant advancements in well-resourced languages. However, for low-resource languages, this landscape remains relatively unexplored. This paper addresses this gap by conducting a comprehensive investigation into sentiment analysis in the context of Hausa, one of the most widely spoken languages within the Afro-Asiatic family. To resolve the problem, three different models based on Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Hierarchical Attention Network (HAN), all tailored to the unique linguistic characteristics of Hausa have been proposed. Additionally, we have developed the first dedicated lexicon dictionary for Hausa sentiment analysis and a customized stemming method to enhance the accuracy of the bag of words approach. Our results indicate that CNN and HAN achieved significantly higher performance compared to other models such as RNN. While the experimental results demonstrate the effectiveness of the developed deep learning models in contrast to the bag of words approach, the proposed stemming method was found to significantly improve the performance of the bag of words approach. The findings from this study not only enrich the sentiment analysis domain for Hausa but also provide a foundation for future research endeavors in similarly underrepresented languages. © 2023 IEEE.
The current synthetic-aperture-radar (SAR) image formation algorithms have been developed for the pulse radar systems, and they are desired to be used for frequency modulated continuous wave (FMCW) radar systems. Since there is a difference between the outputs of pulse radar and FMCW radar, it is necessary to adapt these algorithms to the output of the pulse radar. Beside this, the start-stop approximation, which can be used for signal processing of pulse radar systems, should be taken into account for FMCW radar systems due to the fact that the pulse duration of pulse radar is relatively small in comparison to the modulation time of FMCW radar. The study investigates the phase error caused by the start-stop approximation in processing the data measured by a FMCW radar system for synthetic aperture imaging. The important finding is that the start-stop approximation is valid for processing FMCW SAR data in many cases. If the following circumstances occur simultaneously, such as high radar signal frequency, long modulation time, high platform speed, and short propagation range, then the approximation may become invalid. The simulations and the experiments performed with a wideband 154 GHz FMCW radar support this statement. Author
The Neyman-Pearson lemma, i.e., the likelihood ratio test and its generalized version, have been used for the development of the synthetic aperture radar (SAR) change detection methods. For detecting changes caused by targets on the ground such as vehicles, a target model, or at least certain assumptions concerning the targets, are always required for deriving a statistical hypothesis test. Without the prior knowledge on targets, it is difficult to make any assumption. An inappropriate assumption can degrade change detection performance significantly. To avoid this technical issue, the new forms of likelihood ratio test for SAR change detection are introduced in this paper. The proposed forms are shown to be very flexible. They can be utilized to develop change detection methods for different types of data, e.g., data in scalar form, data in vector form, data represented in complex number, and data represented in real number. The flexibility of the proposed forms is also shown by the the capability to implement change detection methods in the iterative and non-iterative ways. For the illustration purpose, a new change detection method is developed on one of the introduced forms and tested using TanDEM-X data measured in Karlshamn, Sweden in 2016. Author
Sketch-based image retrieval (SBIR) utilizes sketches to search for images containing similar objects or scenes. Due to the proliferation of touch-screen devices, sketching has become more accessible and therefore has received increasing attention. Deep learning has emerged as a potential tool for SBIR, allowing models to automatically extract image features and learn from large amounts of data. To the best of our knowledge, there is currently no systematic literature review (SLR) of SBIR with deep learning. Therefore, the aim of this review is to incorporate related works into a systematic study, highlighting the main contributions of individual researchers over the years, with a focus on past, present and future trends. To achieve the purpose of this study, 90 studies from 2016 to June 2023 in 4 databases were collected and analyzed using the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) framework. The specific models, datasets, evaluation metrics, and applications of deep learning in SBIR are discussed in detail. This study found that Convolutional Neural Networks (CNN) and Generative Adversarial Networks (GAN) are the most widely used deep learning methods for SBIR. A commonly used dataset is Sketchy, especially in the latest Zero-shot sketch-based image retrieval (ZS-SBIR) task. The results show that Mean Average Precision (mAP) is the most commonly used metric for quantitative evaluation of SBIR. Finally, we provide some future directions and guidance for researchers based on the results of this review. © 2013 IEEE.
This paper introduces a new publicly available image-based Swedish historical handwritten character and word dataset named Character Arkiv Digital Sweden (CArDIS) (https://cardisdataset.github.io/CARDIS/). The samples in CArDIS are collected from 64, 084 Swedish historical documents written by several anonymous priests between 1800 and 1900. The dataset contains 116, 000 Swedish alphabet images in RGB color space with 29 classes, whereas the word dataset contains 30, 000 image samples of ten popular Swedish names as well as 1, 000 region names in Sweden. To examine the performance of different machine learning classifiers on CArDIS dataset, three different experiments are conducted. In the first experiment, classifiers such as Support Vector Machine (SVM), Artificial Neural Networks (ANN), k-Nearest Neighbor (k-NN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Random Forest (RF) are trained on existing character datasets which are Extended Modified National Institute of Standards and Technology (EMNIST), IAM and CVL and tested on CArDIS dataset. In the second and third experiments, the same classifiers as well as two pre-trained VGG-16 and VGG-19 classifiers are trained and tested on CArDIS character and word datasets. The experiments show that the machine learning methods trained on existing handwritten character datasets struggle to recognize characters efficiently on the CArDIS dataset, proving that characters in the CArDIS contain unique features and characteristics. Moreover, in the last two experiments, the deep learning-based classifiers provide the best recognition rates.