PhD Thesis : BDA-Lab

BOLT-GAN: A Multivariate Time Series Generative Adversarial Network

Time-series data often arises during monitoring and evaluation of on-going industrial processes. Time series forecasting requires accurate data modelling through description of inherent structures such as trend, cycle, and seasonality by collecting and modeling stochastically the historical data points of a time series. In this paper, we are concerned with time series data that is limited, imbalanced and not readily available for accurate machine learning tasks, e.g., online fraud data or network intrusion data. In this scenario, modeling of time series can be achieved through generative modeling activities in deep learning. Then, abundant temporal data can be generated and used in different ways to achieve application-level forecasts and predictions. In this research work, we focus on the use of Generative Adversarial Networks (GANs) to model and generate limited real-world time-series data. We discover that this is a relatively new research domain with research trends generally focusing on employing the real data to generate or forecast the time series through the GAN in a supervised manner. We propose Bolt-GAN, a novel GAN architecture which is completely unsupervised, i.e., it generates imbalanced time series data from a (gaussian) noise distribution as input without any additional input vector of real data. Moreover, Bolt-GAN has a feedback mechanism through which it improves its performance by using historically generated time series. Using different experimental configurations, we demonstrate that Bolt-GAN generates realistic data over three standard datasets, and achieves better accuracy with standard machine learning algorithms, i.e., it reduces the prediction error on our selected datasets augmented by Bolt-GAN as compared to the original (unaugmented) datasets.

A Novel Framework for Concept Drift Detection using Autoencoders for Classification Problems in Data Stream

In streaming data environments, data characteristics and probability distributions are likely to change over time, causing a phenomenon called concept drift, which poses challenges for machine learning models to predict accurately. In such non-stationary environments, there is a need to detect concept drift and update the model to maintain an acceptable predictive performance. Existing approaches to drift detection have inherent problems like requirements of truth labels in supervised detection methods and high false positive rate in case of unsupervised drift detection. In this research work, we propose a semi-supervised Autoencoder based Drift Detection Method (AEDDM) aimed at detecting drift without the need of truth labels, yet with a high confidence that the detected drift is real. In a binary classification setting, AEDDM uses two autoencoders in a layered architecture, trained on labelled data and uses a thresholding mechanism based on reconstruction error to signal the presence of drift. The proposed method has been evaluated on four synthetic and four real world datasets with different drifting scenarios. In case of real-world datasets, the induced and detected drifts have been evaluated from classifier’s performance viewpoint using seven mostly used batch classifiers as well as from adaptation perspective in an online learning environment using Hoeffding Tree classifier. The results show that AEDDM affectively detects the distributional changes in data which are most likely to impact the classifier’s performance (real drift) while ignoring the virtual drift thus considerably reducing the false alarms with an ability to adapt in terms of classification performance.