|BE - Biomedicinsko inženjerstvo
|15:00-15:40 Pozvano predavanje
|Saturnino Luz (Usher Institute, University of Edinburgh, United Kingdom)
Finding Clues to Cognitive Wellbeing through Speech
|F. Strniša, G. Kosec (Jožef Stefan Institute, Ljubljana, Slovenia)
A Model for Jellyfish Detritus Decay through Microbial Processing
|C. Barakat (Juelich Supercomputing Centre, Juelich, Germany), S. Fritsch (RWTH Aachen University Hospital, Aachen, Germany), M. Riedel, S. Brynjólfsson (University of Iceland, Reykjavik, Iceland)
An HPC-Driven Data Science Platform to Speed-up Time Series Data Analysis of Patients with the Acute Respiratory Distress Syndrome
An increasing number of data science approaches that take advantage of deep learning in computational medicine and biomedical engineering require parallel and scalable algorithms using High-Performance Computing systems. Especially computational methods for analysing clinical datasets that consist of multivariate time series data can benefit from High-Performance Computing when applying computing-intensive Recurrent Neural Networks. This paper proposes a dynamic data science platform consisting of modular High-Performance Computing systems using accelerators for innovative Deep Learning algorithms to speed-up medical applications that take advantage of large biomedical scientific databases. This platform’s core idea is to train a set of Deep Learning models very fast to easily combine and compare the different Deep Learning models’ forecast (out-of-sample) performance to their past (insample) performance. Considering that this enables a better understanding of what Deep Learning models can be useful to apply to specific medical datasets, our case study leverages the three data science methods Gated Recurrent Units, onedimensional convolutional layers, and their combination. We validate our approach using the open MIMIC-III database in a case study that assists in understanding, diagnosing, and treating a specific condition that affects Intensive Care Unit patients, namely Acute Respiratory Distress Syndrome.
|T. Matulić (University of Zagreb Faculty of Electrical Engineering and Computing, Zagreb, Croatia), R. Bagarić (Rudjer Boskovic Institute, Zagreb, Croatia), D. Seršić (University of Zagreb Faculty of Electrical Engineering and Computing, Zagreb, Croatia)
Enhanced reconstruction for PET scanner with a narrow field of view using backprojection method
There are several reconstruction methods for positron emission tomography (PET), but iterative methods based on maximum likelihood estimation are primarily used. In this paper, we propose a 2D reconstruction method for PET scanners that have a narrow field of view (FoV) in the tangential direction and are based on the backprojection (BP) method.
If a PET scanner has a reduced field of view in the tangential direction, the reconstructed image obtained by 2D BP or 2D filtered backprojection produces significant distortions. We used the Raytest ClearPET TM scanner to obtain measurements. The ClearPET scanner consists of a maximum of 20 detector cassettes arranged in a ring configuration. The used ClearPET scanner contains only eight detector cassettes. By knowing the geometry of the ClearPET scanner, we determined a necessary compensation to reduce the undesirable effects caused by the narrow field of view. We simulated the response of all allowed coincidences located on the observed axial intersection of the ClearPET scanner, which we used to obtain an image for compensation. Because of finite dimension of a single crystal in the ClearPET scanner, we also added dithering to the measured and simulated data to minimize the quantization effect resulting from the reconstruction method. Ultimately, the image obtained by proposed compensation was significantly improved compared to the (filtered) backprojection method.
Unlike iterative reconstruction methods, the proposed method is faster, but the resulting images show the object captured by the ClearPET scanner less accurately. The proposed method can be implemented as an initial condition in 2D reconstruction using an iterative method to accelerate convergence.
|F. Feradov (Technical University - Varna, Varna, Bulgaria)
Spectral Features for the Classification of Familiarity from EEG Recordings
Familiarity with different objects or stimuli plays an essential role in forming behavioral and emotional responses. The present paper examines the applicability of spectral features in the classification of levels of familiarity from EEG signals. In particular, examining the differences of PSD of frequency bands and covariance coefficients as EEG features is carried out. The experimental evaluation of the proposed features is conducted using data from the DEAP database and kNN, C4.5 and SVM classifiers, and mean classification accuracy of up to 99.7% is reported.
|I. Vishinov (Innovation DOOEL, Skopje, Macedonia), M. Gusev (Ss. Cyril and Methodius University, Skopje, Macedonia), L. Poposka, M. Vavlukis (University clinic of cardiology, Skopje, Macedonia), I. Ademi (University Clinic of Endocrinology, Skopje, Macedonia)
Distribution Analysis of Long-Term Heart Rate Variability versus Blood Glucose
This research explores the distribution of long-term (LT) heart rate variability (HRV) parameters compared to the distribution of glycated hemoglobin (HbA1C) which depicts the long-term blood glucose levles.
To calculate the correlations, we used Pearson, Spearman and Point-Biserial correlation coefficients. In order to tackle the problem of smaller datasets due to the obstacles of the COVID-19 pandemic, and carry out realistic data, we also explored the distribution of the averages of the HRV parameters for every patient and their HbA1C levels. Polynomial combinations of the HRV parameters and their distribution relative to HbA1C levels were explored as well.
The HRV parameters can be in one of two categories; time-domain (TD) HRV parameters: SDNN, SDNN-1, ASDNN, ASDNN-1, ASDNN-2, ASDNN-3, SDANN-2, SDANN-3, NN50, NN50-1, pNN50, pNN50-1, RMSSD and RMSSD-1; and non-linear HRV parameters: SD1, SD2, SD1/SD2.
In order to have clean ECG signals and thus uncorrupted HRV calculations, the ECG signal strip was processed, annd ectopic beats, artifacts, noise, lost signals and other segments were cleaned ahead of the calculations.
The best thresholds that we found for the patient average datasets were 26.9 for pNN50 with F1-score of 0.66 and 23.1 for RMSSD-1 with F1 score of 0.67. The collection of more data is necessary in the times ahead of us to produce more significant conclusions and predictions.
|G. Temelkov (Innovation Dooel, Skopje, Macedonia), M. Gusev (Univ. Sts Cyril and Methodius, Skopje, Macedonia)
A Method to Detect Ventricular Fibrillation in Electrocardiograms
The objective is to create an algorithm for automatic detection of ventricular fibrillation (VF) in electrocardiogram records. Our approach is based on observing and examining sequences of ventricular fibrillation in their time-domain and representation of their frequency-domain.
The research, design and validation process, along with comprehensive annotations, is carried out on Physionet reference databases. We use the sliding window approach and apply Fast Fourier Transform (FFT) to convert the data from the time domain to the frequency domain. Our approach is based on heuristic analysis of the signals, such as analyzing the detected peak and troughs, combining with the properties of the waveform of the signal and the signal energies in the frequency. The initial classification results by applying the digital signal processing algorithms with data science methods classify VF with a success rate of more than 80% f1 score.
|E. Shaqiri (Innovation Dooel, Skopje, Macedonia), M. Gusev (University Ss Cyril and Methodius, Faculty of Computer Science and Engineering, Skopje, Macedonia), L. Poposka, M. Vavlukis ( University Ss Cyril and Methodius, Medical Faculty, University Clinic of Cardiology, Skopje, Macedonia), I. Ahmeti (University Ss Cyril and Methodius, Medical Faculty, University Clinic of Endocrinology, Skopje, Macedonia)
Developing a Deep Learning Solution to Estimate Instantaneous Glucose Level from Heart Rate Variability
A great deal of studies address the use IoT devices coupled by machine learning in order to predict and better detect health problems. Diabetes is an issue that society is struggling for a very long time. The ease with which ECG signals can be recorded and interpreted provides an opportunity to use Deep Learning techniques to predict the estimated Sugar Levels of a patient. This research aims at describing a Deep Learning approach to provide models for different short term heart rate variability measurements.
Our approach is based on a special method to calculate Heart Rate Variability (HRV) with identification of segments, then averaging and concatenating them to exploit better feature engineering results.The short-term HRV are used for determination of instantaneous plasma glucose levels. Deep Learning method is based on Autokeras, the neural architectural search provided the best results for the 15 minute measurements. The evaluated test set gave the following results: RMSE(0.368), MSE(0.193), R square(51.281), and R squared loss(54.128).
|E. Merdjanovska (Jožef Stefan International Postgraduate School; Jožef Stefan Institute, Ljubljana, Slovenia), A. Rashkovska Koceva (Jožef Stefan Institute, Ljubljana, Slovenia)
Cross-Database Generalization of Deep Learning Models for Arrhythmia Classification
Arrhythmias are a wide-spread group of heart abnormalities. In the area of computational methods for ECG analysis, much research has been done on automated arrhythmia detection. In order to be able to develop such methods, databases containing various arrhythmia examples are necessary. The most popular such database is the MIT-BIH Arrhythmia database. Various deep learning architectures have shown superior arrhythmia classification performance on the MIT-BIH Arrhythmia database. However, the applicability of deep learning models for arrhythmia classification beyond their study-specific database has not been explored so far. In this paper, we test the cross-database generalization capabilities of a convolutional neural network, shown to successfully detect arrhythmias on MIT-BIH Arrhythmia, on three other public databases that contain the same arrhythmia groups, namely INCART, MIT-BIH Supraventricular Arrhythmia, and European ST-T. Additionally, our focus is on realistic evaluation schemes, namely inter-patient, to evaluate how well-established models are able to classify arrhythmias on a much larger number of distinct people not present in the training of the neural network. The results have shown that the cross-database generalization performance decreases if the conditions under which the measurements have been performed (lead position) in the other databases are not the same as in the MIT-BIH Arrhythmia database.
|M. Brložnik (Small Animal Clinic, Veterinary Faculty, University of Ljubljana, Ljubljana, Slovenia), V. Kadunc Kos, P. Kramarič (Clinic for Reproduction and Large Animals, Veterinary Faculty, University of Ljubljana, Ljubljana, Slovenia), A. Domanjko Petrič (Small Animal Clinic, Veterinary Faculty, University of Ljubljana, Ljubljana, Slovenia), V. Avbelj (Department of Communication Systems, Jožef Stefan Institute, Ljubljana, Slovenia)
Electromechanical events in exercise-induced remodelling of the equine heart
Horses that run for prolonged periods of time for athletic performance have larger hearts than physically less active animals. In human endurance athletes, this exercise-induced cardiac remodelling is referred to as athlete's heart. Highly trained individuals have lower resting left ventricular mechanics, i.e., lower systolic left ventricular strain, rotation and twist, and this is an adaptation that is independent of structural remodelling, arterial hemodynamics, and heart rate. Our goal is to obtain simultaneous phonographic and electrocardiographic data, as well as simultaneous echocardiographic and electrocardiographic data in trained and untrained horses to assess whether electromechanical durations are different in exercise-induced remodelling of the heart.
|BE - Biomedicinsko inženjerstvo
|I. Tomasic (Mälardalen University, Västerås, Sweden), R. Trobec (Jozef Stefan Institute, Ljubljana, Slovenia), M. Lindén (Mälardalen University, Västerås, Sweden)
State-Space versus Linear Regression Models between ECG Leads
The first attempts to modeling relationships between electrocardiographic leads, were based on measuring lead vectors by using models of human torso (deterministic approach), and by estimating liner regression models between leads of interest (statistical models). Among the most recent attempts, one of the most prominent is the state-space models approach, because of better noise immunity compared to mean squared error estimated statistical models. This study uses state-space models to synthesize precordial leads and Frank leads, from leads I, II, and V1. The synthesis was evaluated with the linear correlation coefficient (CC) on 200 measurements from the Physionet’s PTB diagnostic ECG database. The results show better performance of regression models (mean CC between 0.88 and 0.96) than the state-space models (mean CC between 0.78-0.86). The leads were not pre-aligned for the R-peaks, which can be the main cause for the lower performances of state-space models, as a previous study has also shown. Residual baseline wander (after filtering) was the dominant reason for not obtaining better synthesis results with both methods.
|D. Cindrić, A. Stanešić, M. Cifrek (Faculty of Electrical Engineering and Computing, Zagreb, Croatia)
Analog Frontend for ECG/EMG Capacitive Electrodes
In this paper, a complete analog frontend for ECG/EMG signal acquisition through non-contact capacitive electrodes is designed. Analog signal acquisition chain consists of low noise, low input bias current preamplifier, passive high-pass filter, 4th order active low-pass Sallen-Key filter with Butterworth approximation and an amplifier with a total gain of 60 dB. System sensitivity to coupling distance and component tolerances for different preamplifier topologies are investigated. Compact PCB design is proposed with emphasis on low noise design and leakage current mitigation in the preamplifier area.
|L. Klaić, A. Stanešić, M. Cifrek (Faculty of Electrical Engineering and Computing, Zagreb, Croatia)
Numerical Modelling of Capacitive Electrodes for Biomedical Signals Measurement
Surface electromyography (sEMG) uses noninvasive technique of measuring muscle action potentials on the skin surface. Numerical methods are often the first step in such a process of electrodes and biomedical electronic systems design. Thoughtful simulation, that precedes physical implementation, may speed up the product development, foreseeing its potential deficiencies in the earliest phase. Familiarisation with all the possibilities of such platforms and considerate model design in distinctively defined steps is the key to time, memory, processing power and money savings. The goal of this research is to explore the impact of the size and shape of capacitive electrode on the quality of capacitive coupling. For this purpose, the COMSOL Multiphysics software is used. In the past thirty years, a considerable number of papers have been dedicated to the implementation of various biceps muscle anatomy approximations. The primary issue of such finite element method models are realistic representations of the generation of action potentials in muscle units and their propagation through tissues and skin. In this paper, a concentric cylinder model of upper arm is proposed in order to allow stationary analysis of electrical field distribution on the electrode-fabric interface based on multiple parameters, such as electrode shape and distance or thickness and dielectric permittivity of fabric.
|M. Melinščak (Faculty of Electrical Engineering and Computing, Zagreb, Croatia), M. Radmilović, Z. Vatavuk (Sestre milosrdnice University Hospital Center, Zagreb, Croatia), S. Lončarić (Faculty of Electrical Engineering and Computing, Zagreb, Croatia)
AROI: Annotated Retinal OCT Images database
The development of optical coherence tomography (OCT) devices has significantly influenced diagnostics and therapy guidance in ophthalmology. The growing number of available images results in the increasing importance of introducing robust algorithms for automatic segmentation in clinical practice. With advances in computer vision in recent years, development of algorithms for segmentation of the retinal structure and/or pathological biomarkers have intensified. However, we are experiencing a reproducibility crisis due to a lack of openly available databases. In this paper we give an overview of a new openly available Annotated Retinal OCT Image (AROI) database that we have developed as a result of the collaboration of one research institution and one hospital. It consists of 1136 annotated B-scans (from 24 patients suffering from age-related macular degeneration) and associated raw high-resolution images. In each B-scan, three retinal layers and three retinal fluids were annotated by an ophthalmologist. Results for intra- and inter-observer errors are obtained to set a baseline for ML algorithms validation. We believe that the AROI database offers many possibilities for the computer vision research community specialized in retinal images and represents a step towards developing a robust artificial intelligence system in ophthalmology.
|M. Gambiraža, I. Kesedžić, M. Šarlija, S. Popović, K. Ćosić (Faculty of Electrical Engineering and Computing, Zagreb, Croatia)
Classification of Cognitive Load based on Oculometric Features
Cognitive load is related to the amount of working memory resources used in the execution of various mental tasks. Different multimodal features extracted from peripheral physiology, brain activity, and oculometric reactions have been used as non-intrusive, reliable and objective measures of cognitive load. In this paper, we use data from 38 participants performing a four-level difficulty n-back task (0-, 1-, 2-, and 3-back task), with their oculometric reactions simultaneously recorded. Based on the neuroanatomic structure and function of the visual system, 19 oculometric features are extracted and organized into 3 groups related to: pupil dilation, blinking, and fixation. The discriminative power of each group of features was evaluated in three-level cognitive load classification using a support vector machine (SVM) model and feature selection, and the achieved classification accuracies were: 50% using only pupil dilation features, 50% using only blink-related features, 50.3% using only fixations-related features. Finally, a 50% classification accuracy was achieved using a combination of all extracted oculometric features. The presented results show that various groups of oculometric features provide complementary information about the subject’s cognitive load. The comparison of the extracted groups of features is given, and the most important features in terms of classification performance are discussed.
|J. Dobša, D. Šebalj (Faculty of Organization and Informatics, University of Zagreb, Varaždin, Croatia), D. Bužić (Faculty of Organization and Informatics, University of Zagreb; College for Information Technologies, Varaždin, Zagreb, Croatia)
Classification of Emotions Based on Text and Qualitative Variables
The aim of the paper is to compare performance of classification of samples based on self-reporting emotions of subjects described by qualitative variables and textual description of situation in which emotion is experienced. In the research it is used ISEAR data set consisting of 7666 samples which are classified in seven classes representing basic emotions (joy, fear, anger, sadness, disgust, shame, and guilt). For classification based on text it is used logistic regression (LR) as well as deep learning methods of convolution neural networks (CNN), long short term memory networks (LSTM), and convolutional long short term memory networks (C-LSTM). For classification based on qualitative variables it is used LR and multilayer perceptron (MLP). Performance of classification according to textual description is similar for LR and deep learning methods ranging from 53,65% for CNN to 57,42% for LR in accuracy, while LR (accuracy of 77,19%) outperforms MLP (accuracy of 51,83%) in classification according to qualitative variables.
Also, we constructed ensemble classifiers based on text and qualitative variables showing improvement in performance compared to separate classifiers and accuracy ranging from 63,04% (CNN for text and MLP for qualitative variables) to 80,55% (LG for both text and qualitative variables).
|S. Popov (Jožef Stefan International Postgraduate School, Ljubljana, Slovenia), J. Tratnik, M. Breskvar, D. Mazej, M. Horvat, S. Džeroski (Jožef Stefan Institute, Ljubljana, Slovenia)
Relating Prenatal Hg Exposure and Neurological Development in Children with Machine Learning
We use machine learning techniques to address a problem from environmental epidemiology. Prenatal exposure to mercury (Hg) can impair the neurological development of children. We study the relations between exposure factors and indices of neurological development of children. To this end, we use predictive modelling approaches for regression, as well as methods for estimating feature importance. While the learned models are insufficient for accurate prediction of neurodevelopment indices, they nevertheless point to the exposure factors that most influence neurological development.
|K. Babić, M. Petrović, S. Beliga, S. Martinčić-Ipšić (Department of Informatics, University of Rijeka, Rijeka, Croatia), M. Pranjić (Jožef Stefan International Postgraduate School, Zagreb, Croatia), A. Meštrović (Department of Informatics, University of Rijeka, Rijeka, Croatia)
Prediction of COVID-19 Related Information Spreading on Twitter
In this paper, we explore the influence of COVID-19 related content in tweets on their spreadability. The experiment is performed in two steps on the dataset of tweets in the Croatian language posted during the COVID-19 pandemics. In the first step, we train a feedforward neural network model to predict if a tweet is highly-spreadable or not. The trained model achieves 62.5\% accuracy on the binary classification problem. In the second step, we use this model in a set of experiments for predicting the average spreadability of tweets. In these experiments, we separate the original dataset into two disjoint subsets: one composed of tweets filtered using COVID-19 related keywords and the other that contains the rest of the tweets. Additionally, we modified these two subsets by adding and removing tokens into tweets and thus making them artificially COVID-19 related or not related. Our preliminary results indicate that tweets that are semantically related to COVID-19 have on average higher spreadability than the tweets that are not semantically related to COVID-19.
|J. Niehaus, N. Caporusso (Northern Kentucky University, Highland Heights, United States)
An Infrastructure for Integrated Temperature Monitoring and Social Tracking
The unanticipated outbreak of the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has rapidly escalated into a global emergency that infected more than 100 million individuals and caused over 2 million deaths worldwide. Consequently, national governments have enacted drastic provisions such as shelter-in-place and lockdown orders, especially in the first phases of the Coronavirus disease 2019 (COVID-19) pandemic and in case of faster growth of the infection curve. Subsequently, other types of preventive non-pharmaceutical practices, including social distancing and Personal Protection Equipment (PPE) requirements, have been implemented to ensure the safety of individuals while releasing restrictions. Unfortunately, other countermeasures, such as temperature monitoring and social tracking using smartphone applications, have been less successful, because of their inherent limitations and poor user adoption.
In this paper, we introduce an infrastructure-based solution based on a centralized database and on a distributed network of acquisition nodes that leverages the combined use of temperature monitoring and social tracking. Our system aims at filling the effectiveness gap (e.g., in the case of asymptomatic and presymptomatic individuals) as well as increasing the information available to limit potential virus transmission. We discuss the architecture of the system, detail its components, discuss its efficacy, and address concerns related to users’ privacy and data access.
|V. Miletić (University of Rijeka Department of Informatics, Rijeka, Croatia), P. Nikolić (RxTx Research, Zagreb, Croatia), D. Kinkela (University of Rijeka Department of Informatics, Rijeka, Croatia)
Structure-based Molecular Docking in the Identification of Novel Inhibitors Targeting SARS-CoV-2 Main Protease
There have been several studies of natural compounds used as SARS-CoV-2 inhibitors. Among those, we selected the most viable natural anti-viral compound, rutin, as a basis for structure-based molecular docking campaign using databases of commercially available compounds that are potential ligands. The known and well-studied SARS-CoV-2 main protease structure was used as a target and Asinex screening library was filtered to select structurally similar and pharmacokinetically feasible compounds. Before screening campaing, the protein was minimized and selected compounds were protonated and parametrized. A modified version of rDock high-throughput virtual screening tool called RxDock was used for molecular docking. RxDock was developed to enable running large molecular docking studes on modern computer systems, including supercomputers and clouds. Our approach combines traditional approach of pharmaceutical industry where natural compounds are used as a template to develop novel inhibitors while using novel high-throughput virtual screening techniques and validation tests. It promises to pave a way to develop an agnostic approach in development of novel inhibitors while keeping the cost of both the computational protocols and bioassays lower than the current drug discovery pipeline.
|DS - Znanost o podacima
|E. Građanin, I. Prazina, V. Okanovic (Faculty of Electrical Engineering, Sarajevo, Bosnia and Herzegovina)
Automatic web page robustness grading
This paper represents a solution to the problem of automatization of web page robustness score grading. Robustness of web page is best defined as a property of a specific web page to keep its layout and style of elements after applying diﬀerent modifications. The rapid development of web pages has enabled a quick creation of numerous web pages, but the question is what is the quality of those web pages in terms of robustness. Automatic grading enables a relatively fast way of creating a metric in terms of score that specific web pages get after being tested for the level of robustness. The research framework consists of diﬀerent technologies and concepts that have been used during the implementation of a practical solution. Also, the paper describes data structures that have been used to represent web pages and also describes machine learning methods such as neural networks that have been used in order to calculate the robustness score.
|G. Hamzaj (SEEU, Prishtina, Kosovo), Z. Dika (SEEU, Tetovo, Macedonia)
Choosing the framework for managing data quality in the organization for the defined data quality dimensions
Providing qualitative data is one of the main objectives in various organizations and institutions for improvements of the quality of services provided. Assessing and improving the quality of data is a very difficult task due to the very large volume and different data sources with different data structure that we have today. Through this paper we are evaluating the dimensions, metrics and frameworks including the respective assessment and improvement processes, thus constructing the Framework for assessment and improvement of Data Quality approach by the organization. Due to the many existing frameworks, the main focus of this paper is on frameworks that have wide implementation in various fields and not only in specific fields.
|E. Vušak, V. Kužina, A. Jović (University of Zagreb Faculty of Electrical Engineering and Computing, Zagreb, Croatia)
A Survey of Word Embedding Algorithms for Textual Data Information Extraction
Unlike other popular data types, such as images, textual data cannot be easily converted into a numerical form that machine learning algorithms can process. Therefore, text must be embedded into a vector space using embedding algorithms. These algorithms attempt to encapsulate as much information as possible from the text into a resulting vector space. Natural language is complex and contains numerous layers of information. Information can be obtained from a sequence of characters or subword units that make up the word. It can also be derived from the context in which a word occurs. For this reason, a variety of word embedding algorithms have been developed over time, which use different pieces of information in different ways. In this paper, the currently available word embedding algorithms are described and it is shown what kind of information these algorithms use. After analyzing these algorithms, we discuss how it can be advantageous to use combinations of different types of information in different research and application areas.
|V. Kužina, E. Vušak, A. Jović (University of Zagreb Faculty of Electrical Engineering and Computing, Zagreb, Croatia)
Methods for Automatic Sensitive Data Detection in Large Datasets: a Review
In recent years, the need for detection and de-identification of sensitive data in both structured and unstructured forms has increased. The methods used for these tasks have evolved accordingly and currently there are many solutions in different areas of interest. This paper describes the need for the detection of sensitive data in large datasets and describes the challenges associated with automating the detection process. It gives a brief overview of the rule-based and machine learning methods used in this area and examples of their application. The advantages and disadvantages of the described methods are also discussed. We show that the most recent detection solutions are based on the latest and most advanced models proposed in the field of natural language processing, but that there are still some rule-based methods used for certain types of sensitive data.
|P. Zupančič (Faculty of Information studies, Novo mesto, Slovenia), P. Panov (Jožef Stefan Institute, Ljubljana, Slovenia)
The Influence of Window Size on the Prediction Power in the Case of Absenteeism Prediction from Timesheet Data
In this paper, we focus on the task of predicting an employee’s absence based on historical timesheet data. More specifically, based on one-year historical data, we want to examine how the size of the time window of the historical timesheet profiles influences the prediction power in the case of one-week ahead absenteeism prediction. In our case, the time window denotes an absence profile for a sequence of weeks that precede the target week, which is then used as a descriptor when building the predictive model. The data are obtained from MojeUre, a system for tracking and recording working hours and includes timesheet profiles of employees from different companies in Slovenia. We design different analysis scenarios and use a selection of regression algorithms from the Weka data mining software as the main tool for building predictive models. To analyse the influence of the window size on the predictive power we use as indicators different performance evaluation measures. In general, we conclude that using longer window size helps to achieve better predictive performance.
|B. Polonijo (Srednja škola za elektrotehniku i računalstvo, Rijeka, Croatia), S. Šuman, I. Šimac ( Veleučilište u Rijeci, Rijeka, Croatia)
Propaganda Detection Using Sentiment Aware Ensemble Deep Learning
In today's highly globalized world with vast information transfers, it is increasingly difficult to distinguish valid information from attempts to manipulate human attitude through propaganda, which is a growing threat due to its spread and sophistication. This paper proposes a deep learning method to combine sentiment scores with traditional Word2Vec vectors that result in a sentiment aware representation containing semantic and emotional information, which, when used together, result in a more accurate propaganda classification model. The Word2Vec vector method is a useful tool to recognize the semantic meaning of words and their structures in natural language processing, i.e., their emotional classification, and thus to detect propaganda. An emotional dictionary built into VADER's sentiment analysis results in a text sentiment score representing emotional information. This method preserves the flexibility of the Word2Vec vector by combining it with an output of sentiment analysis. Tests conducted using a Word2Vec model without sentiment data and using sentiment data with standard deep learning methods for propaganda detection show that this hybrid approach increases propaganda classification accuracy.
|L. Jovanovska (Jožef Stefan International Posgraduate School , Ljubljana, Slovenia), P. Panov (Jožef Stefan Institute, Ljubljana, Slovenia)
Semantic Representation of Machine Learning and Data Mining Algorithms
In this paper, we describe an extension of the ontology of core data mining entities (OntoDM-core) that will improve the semantic representation of machine learning and data mining algorithms. The OntoDM-core acknowledges the multi-faceted aspect of algorithms and accordingly provides entities, such as algorithm specification, algorithm implementation, and algorithm execution. We build upon this representation and include a more detailed representation of algorithms, including terms such as hyperparameter, optimization problem, complexity function, etc. Furthermore, we discuss the potential applications of the ontology. It can be used as a backbone of a repository and knowledge base for storing semantic annotations of algorithms and for assisting algorithm developers and domain experts with the task of manual semantic annotation. Ultimately, the corpus of manually annotated algorithms using the ontology vocabulary will serve as a foundation for automating the process of semantic annotation of algorithms from text using natural language processing techniques.
|M. Domladovac (Faculty of Organization and Informatics, Čazma, Croatia)
Comparison of Neural Network with Gradient Boosted Trees, Random Forest, Logistic Regression and SVM in predicting student achievement
Student success is paramount at all levels of education, especially for universities. Improving the success and quality of enrolled students is one of the most important concerns. It is important to observe the initial symptoms of students at risk and implement earlier preventive measures to determine the cause of the student dropout rate. In this research, we will use data mining techniques to identify the factors that affect student success. We will use the data consisting of log data and grades from students from a course at the University of Zagreb, Faculty of Organization and Informatics in Croatia. We will use the data that consists of log data and grades of students from a course at the University of Zagreb, Faculty of Organization and Informatics in Croatia. In this study, machine learning methods are used to evaluate the performance of deep learning compared to traditional machine learning methods in the task of binary classification of whether a student fails or passes the exam. The results show that the deep neural network has a very good performance, the second-best, and with the optimizations, there are many opportunities for even better generalization.
|S. Popov (Jožef Stefan International Postgraduate School, Ljubljana, Slovenia), K. Kavkler (Institute for the Protection of Cultural Heritage of Slovenia, Ljubljana, Slovenia), S. Džeroski (Jožef Stefan International Postgraduate School, Ljubljana, Slovenia)
Using Machine Learning to Identify Factors Contributing to Mould in the Celje Ceiling Painting
This paper presents the analysis of data about the damaged paintings on the Celje ceiling in the Celje Regional Museum. Because of old age, and due to micro-climate conditions the paintings started to deteriorate. The goal of this analysis is to build predictive models for the damage of the paintings from the existing data on those paintings and score different factors for affecting the deterioration (moulding). The data was available through the Institute for the Protection of Cultural Heritage of Slovenia. It was preprocessed in Python and the data mining task (classification and feature ranking) was carried out in Weka. All models were built using 10-fold cross validation, and were evaluated on their accuracy (CA), per class precision and recall, and area under ROC curve (AUC) scores. The obtained results give some insights as to what may cause the mould and appear to be consistent with the knowledge of the domain experts.
|D. Janakieva, G. Mirceva, S. Gievska (Faculty of computer science and engineering, Ss. Cyril and Methodius University in Skopje, Skopje, Macedonia)
Fake News Detection by Using Doc2Vec Representation Model and Various Classification Algorithms
Dissemination of fake news and disinformation on social media platforms pose a serious threat to society. Distinguishing between fake and truthful information is not an easy task for humans as well and automatic detection of fake news has received considerable attention in recent years. In this paper, we focus on the task of automatic detection of fake news using several machine learning algorithms. The impact of various linguistic features and preprocessing techniques on the performance of the classifiers has been evaluated using a dataset containing 17324 news entries. The experimental results are encouraging, with the most successful models obtaining accuracy of 99.97%.
|D. Nagavci Mati, M. HAMITI, B. Selimi, J. Ajdari (South East European University, Tetovo, Macedonia)
Building Spell-Check Dictionary for low-Resource Language by Comparing Word Usage
Each language has its own vocabulary which is spoken by a corresponding group of speakers. There are generally languages that have better resources and thus NLP methods typically perform generally better for such languages; whereas on other hand, in the case of a large number of low-resource languages – there is a lack of sufficient annotated data that can be used in order to efficiently use the unsupervised methods for NLP tasks. As a result, a spell checker is a necessity for composing any documentation in a language; typically, by identifying words that are typologically and grammatically correct as well as misspelled words in such a language. The aim of this paper is to present a spell-check dictionary for the Albanian language by comparing word usage among various texts. The paper seeks building a spell-check dictionary for the Albanian language by defining words to be entered in the dictionary from a large text collection taken from experiments and then conducting a comparison review of word usage frequency. The corpora include 250k sentences for the Albanian language of different fields such as computer science, economics, law, medicine, politics, tourism, art, psychology, etc. This spell-check dictionary would further contribute to the ease of use of the Albanian language in electronic media.
|Znanost o podacima
|G. Thakkar, N. Mikelić Preradović, M. Tadić (Faculty of Humanities and Social Sciences, Zagreb, Croatia)
Negation Detection Using NooJ
The availability of extensive annotated data for natural language processing tasks is an unsolved problem. Transfer learning techniques usually mitigate these issues by relying on existing models in another language. If no such models exist, the whole transfer learning setup becomes an implausible option. This paper presents a simple approach to use grammar rule as a noisy labelling function to train a classic generative-discriminative classification setup. The approach relies on a simple NooJ grammar along with a series of other data labelling functions. We evaluate the approach on the Conan-Doyle dataset for the task of explicit negation detection with a lowresource setting and report an improvement of 2% over the baseline.
|G. Oparin, V. Bogdanova , A. Pashinin (Institute for System Dynamics and Control Theory of SB RAS, Irkutsk, Russian Federation)
Service-oriented Application for Solving Parametric Synthesis Problem of a Boolean Network with Given Dynamic Properties
A constructive logical method for the synthesis of the characteristic matrix of a linear binary dynamical system with a given set of one-point attractors and one-step dynamics of reaching this set from any state is proposed. The problem conditions are written as a quantified Boolean formula with subsequent verification of its truth using the QSAT solver. This solver provide the values of the elements of the required matrix as a certificate. The proposed method implementation is performed using automation tools for constructing and executing composite services in an applied microservices package for solving problems of qualitative research of binary dynamic systems. These tools provide cloud services for getting a quantified Boolean formula in QDIMACS format, verifying its truth, getting a constructive solution to the considered problem, and supporting synchronization of cloud and local data in a hybrid cloud infrastructure using Dew Computing.
|C. Barakat (Juelich Supercomputing Centre, Juelich, Germany), M. Riedel, S. Brynjólfsson (University of Iceland, Reykjavik, Iceland), G. Cavallaro, J. Busch (Juelich Supercomputing Centre, Juelich, Germany)
Design and Evaluation of an HPC-based Expert System to Speed-up Retail Data Analysis Using Residual Networks Combined with Parallel Association Rule Mining and Scalable Recommenders
Given the Covid-19 pandemic, the retail industry shifts many business models to enable more online purchases that produce large transaction data quantities (i.e., big data). Practical data science methods infer seasonal trends about products from this data and spikes in purchases, the effectiveness of advertising campaigns, or brand loyalty but require extensive processing power leveraging high-performance computing (HPC) to deal with large transaction datasets. This paper proposes an HPC-based expert system tailored for 'big data analysis' in the retail industry, providing various data science methods and tools to speed up the data analysis with interfaces to Cloud-based services. Our expert system leverages modular supercomputing (i.e., from DEEP series of HPC projects) to enable the fast analysis by using parallel and distributed algorithms such as association rule mining (i.e., FP-GROWTH) and recommender methods (i.e., collaborative filtering). It further enables the seamless use of accelerators of supercomputers or cloud-based systems to perform automated product tagging (i.e., RESNET-50 deep learning networks for product image analysis) to obtain color, shapes, and other features of products. We validate our expert system and its enhanced knowledge representation with commercial datasets obtained from the ON4OFF retail research project in an industry case study in the beauty sector.
|J. Slak (Jozef Stefan Institute, Ljubljana, Slovenia)
Partition-of-Unity Based Error Indicator for Local Collocation Meshless Methods
Local collocation meshless methods are a class of numerical methods for solving partial differential equations that obtain the solution by approximating the unknown field locally around each computational node. Computing only small local approximations is often more cost-effective than computing a global approximation, but comes with a downside that the final solution is known only in the computational nodes, and the local approximations centered around each node do not form a continuous function. We present an efficient partition-of-unity based interpolation method for gluing the local approximations together into a smooth field. Additionally, this method can also be used to construct an aposteriori error indicator, which can be used to adaptively refine the solution in regions where the quality is insufficient.
|M. Rot, G. Kosec (Jožef Stefan Institute, Ljubljana, Slovenia)
Natural Convection of Non-Newtonian Fluids in a Differentially Heated Closed Cavity
Fluids in computational hydrodynamics are often considered Newtonian, meaning a constant viscosity, but that is just a crude approximation for many real-world examples. Accounting for Non-Newtonian behaviour can provide a significant improvement in simulation accuracy. We present a meshless solution for the natural convection of a power-law non-Newtonian fluid driven by differentially heated cavity walls. The Navier-Stokes and heat transport equations are coupled using the Boussinesq approximation and numerically solved with the generalized finite differences, explicit Euler stepping and Chorin’s projection method.
|D. Davidović (Ruđer Bosković Institute, Zagreb, Croatia)
An overview of dense eigenvalue solvers for distributed memory systems
Solving large-scale eigenvalue problems presents a central problem in many research fields, such as electronic structure calculation, macromolecular simulations, solid states, theoretical physics, and combinatorial optimizations. Computing the required eigenvalues and the corresponding eigenvectors of the large matrices is a challenging task requiring significant computational time. Therefore, the computation of such problems is usually executed on large computational resources, consisting of a large number of compute nodes connected with fast interconnection links and, more often, equipped with the accelerators, such as graphic processing units. Nowadays, when the whole world races for the first exascale supercomputer and the research computational appetites are bigger than ever, the need for scalable and high-performance eigenvalue solvers, capable of exploiting such large, distributed-memory machines, is of crucial importance for the further breakthroughs in the research. This paper gives an overview of the existing numerical linear algebra packages and libraries implementing solvers for dense eigenvalue problems, tailored for distributed-memory systems. The overview analysis showed that numerous eigensolvers for distributed-memory systems exist, however, not many of them are capable of exploiting the full potential of the modern, heterogeneous, GPU-based machines with complex memory hierarchies.
|M. Žaja, I. Čavrak (University of Zagreb, Faculty of Electrical Engineering and Computing, Zagreb, Croatia), T. Lipić (Ruđer Bošković Institute, Zagreb, Croatia)
Benchmarking Apache Beam for IoT Applications
The pervasiveness of computational and communication devices, coupled with innovative Internet of Things (IoT) application scenarios, led to a massive increase in the number of available data sources and the volume of data produced per unit of time. This contributed to the emergence of many open-source streaming data processing systems of different characteristics and diverging performances in specific usage scenarios. The heterogeneity of platforms, programming languages, and models used in such systems resulted in the prohibitively complex effort required to quickly and efficiently test their suitability for specific use cases. Apache Beam framework aims to introduce the unifying programming model for data processing systems, tackling the heterogeneity problem and allowing for fast testing of performance in specific usage scenarios and migration between different platforms.
In order to test the maturity of the Apache Beam framework and its performance in processing data from the IoT domain, we constructed a benchmarking environment employing data sets with significant spatio-temporal properties along with a representative set of streaming operations for such data. We used Apache Kafka as data source and result collection within different computational resource configurations hosting execution engines. For each combination of the computational resource configuration, execution engine, and streaming operation, we measured and compared performance in terms of average throughput and its variance. The results show that the evaluated Flink and Spark runners, deployed on the single machine, manifest the law of diminishing returns rather quickly with regards to the number of available cores. While Spark runner core throughput significantly outperforms Flink runner’s core throughput, Spark runner’s system throughput is consistently lower than Flink ones.
|M. Jančič (Jozef Stefan Institute, Ljubljana, Slovenia), V. Cvrtila (Faculty of Mathematics and Physics, Ljubljana, Slovenia), G. Kosec (Jozef Stefan Institute, Ljubljana, Slovenia)
Discretized Boundary Surface Reconstruction
Domain discretization is an essential part of the solution procedure in numerical simulations. Meshless methods simplify the domain discretization to positioning of nodes in the interior and on the boundary of the domain. However, generally speaking, the shape of the boundary is often undefined and thus needs to be constructed before it can be discretized with a desired internodal spacing. Domain shape construction is far from trivial and is the main challenge of this paper. We tackle the simulation of moving boundary problems where the lack of domain shape information can introduce difficulties. We present a solution for 2D surface reconstruction from discretization points using cubic splines and thus providing a surface description anywhere in the domain. We also demonstrate the presented algorithm in a simulation of phase-change-like problem.
|R. Trobec, M. Depolli (Jožef Stefan Institute, Ljubljana, Slovenia)
A k-d Tree Based Partitioning of Computational Domains for Efficient Parallel Computing
Among prospective opportunities for accurate solutions of large scientific problems are parallel computers that are entering today the exascale era thanks to ever increasing number of communicating processors. In scientific computing, such an enormous computational power can always be harvested for more accurate or enduring solutions of physical phenomena. Our work is focused in a decomposition of computational domains of design problems that are represented by a large set of discretization nodes, which enable the solution to be formalized by a large and sparse system of equations. The computational domain decomposition together with a parallelized system construction and its solution are cornerstones of an efficient parallelization. We propose a methodology, based on the k-d tree, that can efficiently partition computational domains of arbitrary geometries and is independent of discretization approaches. Beside the domain partitioning, common discretization nodes are determined that are shared among processors responsible for neighboring subdomains. The analysis of computational complexity confirms that the partitioning methodology remains efficient and scalable on parallel computers with large numbers of processors.
|I. Vasileska, L. Bogdanović, L. Kos (Faculty of Mechnaical Engineering, Ljubljana, Slovenia)
Particle-in-Cell Code for GPU Systems
Particle simulation in the field of nuclear fusion is a well-established technique which has spawned dozens of codes around the world through years (e.g. BIT1, VPIC, VSIM, OSIRIS, REMP, EPOCH, SMILEI, FBPIC, GENE, WARP, PEPC) with varying degrees of specialization for different physics areas and accessibility. Particle-in-cell (PIC) codes simulate numerous plasma phenomena on HPC systems. Today, flagship supercomputers feature GPUs per compute node to achieve unprecedented computing power at high power efficiency. PIC codes require new algorithm design and implementation for exploiting such accelerated platforms. In this work, we design and optimize simple PIC code called SIMPIC, to run on a general GPU compute node. First we provide a fully GPU SIMPIC code and show that the run time is 50 % reduced than in CPU. This code in future will be used as a test example for modifying the other more complex PIC codes from CPU to GPU.
|Ž. Jeričević (KMS Technologies, Houston, United States)
Fitting Sum of Exponentials to Experimental Data II: Global Approach Using Linearization by Numerical Integration
A method to resolve the components of multiexponential decay based on a combination of the Knutson Global method and linearization by numerical integration is presented. The method is general, numerically robust and fast (practically real time). Removal of ill-conditioning through use of a Global approach, noise attenuation and method flexibility are analyzed in detail. The applicability of the method in the analysis of relaxation processes, such as fluorescence spectroscopy and medical imaging, is presented.
|J. Radešček, M. Depolli (Jožef Stefan Institute, Ljubljana, Slovenia)
Developer-Centric Design of Branch and Bound Algorithm
We present a C++ template for writing branch and bound algorithms that are efficient yet easy to read and understand. The approach separates the logic of an abstract methodology for dealing with large search spaces efficiently from the concrete implementation of several algorithms. We describe how we have designed and implemented an abstract branch and bound method, both in sequential and parallel version, and then used it to implement algorithms for finding k-clique, maximum clique, and listing all maximal cliques. The algorithm for finding maximum clique is the main goal, since we also have access to the code of a state-of-the-art parallel algorithm available, and we can use it as reference for comparison with the new algorithm. From the experiments on 36 input graphs of various difficulty and on up to 32 CPU threads, we show that the new algorithm is not much slower than the much more optimized reference algorithm. By developing two more algorithms on the branch and bound template, we demonstrate that the proposed abstract methodology is not only efficient but also easy to use, and can facilitate the development of closely related algorithms.
Karolj Skala (Croatia), Aleksandra Rashkovska Koceva (Slovenia), Davor Davidović (Croatia)
Marian Bubak (Poland), Jesús Carretero Pérez (Spain), Tiziana Ferrari (Netherlands), Dieter Kranzlmüller (Germany), Ludek Matyska (Czech Republic), Dana Petcu (Romania), Uroš Stanič (Slovenia), Tibor Vámos (Hungary), Matjaž Veselko (Slovenia), Yingwei Wang (Canada)
Enis Afgan (Croatia), Viktor Avbelj (Slovenia), Davor Davidović (Croatia), Matjaž Depolli (Slovenia), Simeon Grazio (Croatia), Marjan Gusev (North Macedonia), Vojko Jazbinšek (Slovenia), Jurij Matija Kališnik (Germany), Zalika Klemenc-Ketiš (Slovenia), Dragi Kocev (Slovenia), Gregor Kosec (Slovenia), Miklos Kozlovszky (Hungary), Lene Krøl Andersen (Denmark), Tomislav Lipić (Croatia), Željka Mihajlović (Croatia), Panče Panov (Slovenia), Tonka Poplas Susič (Slovenia), Aleksandra Rashkovska Koceva (Slovenia), Karolj Skala (Croatia), Viktor Švigelj (Slovenia), Ivan Tomašić (Sweden), Roman Trobec (Slovenia), Roman Wyrzykowski (Poland)
Registration / Fees:
The discount doesn't apply to PhD students.
REGISTRATION / FEES
|Price in EUR
Up to 13 September 2021
From 14 September 2021
|Members of MIPRO and IEEE
|Students (undergraduate and graduate), primary and secondary school teachers
Rudjer Boskovic Institute
Center for Informatics and Computing
HR-10000 Zagreb, Croatia
All submitted papers will pass through a plagiat control and blind peer review process with at least 2 international reviewers.
On the basis of reviewers' opinion and voting result from the conference attendance we will qualify the Best paper for the prize that will be awarded as a part of the final event at the DS-BE conference.
Accepted papers will be published in the ISSN registered conference proceedings. Presented papers will be submitted for inclusion in the IEEE Xplore Digital Library.
JOURNAL SPECIAL ISSUE
Authors of the best scientific papers will be invited to submit an extended version of their work to the Scalable Computing: Practice and Experience (ISSN 1895-1767) Journal.
Opatija, with its 170-year-old tourism tradition, is the leading seaside resort of the Eastern Adriatic and one of the most famous tourist destinations on the Mediterranean. With its aristocratic architecture and style, Opatija has been attracting artists, kings, politicians, scientists, sportsmen, as well as business people, bankers and managers for more than 170 years.
The tourist offer in Opatija includes a vast number of hotels, excellent restaurants, entertainment venues, art festivals, superb modern and classical music concerts, beaches and swimming pools – this city satisfies all wishes and demands.
Opatija, the Queen of the Adriatic, is also one of the most prominent congress cities in the Mediterranean, particularly important for its ICT conventions, one of which is MIPRO, which has been held in Opatija since 1979, and has attracted more than a thousand participants from over forty countries. These conventions promote Opatija as one of the most desirable technological, business, educational and scientific centers in South-eastern Europe and the European Union in general.
For more details, please visit www.opatija.hr and visitopatija.com.