Master of Computer Science Projects
High-performance computing platform for DNA analysis
The aim of the project is to build a forensic DNA database and mul- tiple analysis tool to match new input DNA sample with the DNA data stored in database. The overall work in the project includes collecting various human genome samples, designing a distributed storage to store and access the DNA, finding the suitable algorithms to perform multiple analysis and creating a web interface to interact with the system and visualize its findings. This report will empha- size majorly on the implementation of analysis tool. The current results of this project will get the DNA sequence as an input and perform blast against the DNA database selected by the user and visualize its match as an output.
Students:
- Lokesh Pathak
- Arun Kumar Rajasekar
- Versita Narayanasamy
Supervisors:
- M. Ali Babar
- Aufeef Chauhan
Cyber Security Decisions Toolkit for Boards/Executives
Cyber Security health of an organization is a reflection of multiple indicators, which are measured by security systems such as Intrusion Detection System (IDS), Security Information and Event Management (SIEM), as well as human processes such as phishing awareness campaigns. The agglomeration of these security metrics forms the Common Cyber Operating Picture (CCOP) of the organization. In general, organizations have their own security indicators, sources, and models of how indicators are fused into a CCOP. These models must be implemented as software programs in practice. This process can be error-prone and inefficient, especially if it has to be repeated in multiple organizations. This project aims to design, develop, and evaluate an automation framework called CCOP-Fuse to automate the process of fusing security metrics into CCOP.
Students:
- Anupam .
Supervisors:
- M. Ali Babar
- Nguyen Khoi Tran
Accountable Machine Learning with Blockchain
This research aims at improving the accountability and trustworthiness of the machine learning models, which are increasingly prevalent in the operation and security of modern software platforms and technology. It solves two problems: (1) develop and evaluate a machine-learning based approach for detecting tampering of machine-learning models based on the historical records generated during the training process of those models, and (2) design and develop a blockchain-based framework for persisting historical records of machine learning models in a tamper-proof and transparent manner.
Students:
- Nini Cui
Supervisors:
- M. Ali Babar
- Nguyen Khoi Tran
- Bushra Sabir
Big data analytics with cloud computing
Students:
- Shagun Dhingra
Supervisors:
- M. Ali Babar
- Faheem Ullah
Investigating the correlation between Performance and Energy Consumption of Big Data Systems
Context: The exponential growth of digital data has resulted in the widespread adoption of cloud computing and big data analytical frameworks such as Apache Spark and Apache Flink. These frameworks support the distribution of data storage and data processing across computing nodes in a cloud cluster. In general, cloud computing focuses on achieving high performance (e.g., throughput and resource utilization) and low energy consumption. However, there is a paucity of research on how these two quality attributes relate to each other.
Objective: This project aims to investigate the correlation between performance and energy consumption of big data systems deployed on a cloud. The project will explore this correlation for popular big data frameworks (i.e., Apache Spark and Apache Flink) deployed on various cloud models such as private cloud, public cloud, and hybrid cloud. Furthermore, the project will use statistical methods (e.g., ANOVA test) to determine whether or not a statistically significant correlation exists between performance and energy consumption. As a subsequent part, the project will develop a framework for maximizing the performance and minimizing the energy consumption of the big data systems designed using big data frameworks like Apache Spark and Apache Flink. Method: As part of the cloud models, the project will use OpenStack for private deployment of big data systems and use Microsoft Azure as public cloud. The project will use big data benchmark workloads (e.g., word count and page rank) for evaluating the big data system. The project will use available guidelines for configuring big data frameworks on various cloud models. The key quality attributes (related to performance) to be considered in this project include latency, throughput, and scalability. Opensource tools will be used for measuring energy consumption of the underlying computing cluster. Advantages: (i) Gaining hands-on experience in big data analytics, cloud computing, and statistical analysis (ii) Developing skills in working with big data benchmark workloads (iii) Potential publication in a top venue such as the journal of Future Generation Computer Systems.
Profile: Students having interest in big data analytics, cloud computing, statistical analysis, and skills in programming language such Python or Java and Scala.
Students:
- Wei Zhang
- Xin Chen
Supervisors:
- M. Ali Babar
- Yaser Mansouri
- Bushra Sabir
The impact of node failures on the performance of big data application
These days, big data becomes very important worldwide. Every company tries to maximize the big data potential to get benefit from the key insights and useful knowledge provided by big data analytics. To solve the demand for extensive computational power required by big data analytics systems and to take full advantage of cloud computing, many companies want to implement big data analytics systems on cloud computing. However, a question is rising: “What is the performance and cost of different cloud computing models for implementing different big data analytics systems?”. This project will solve the question. In this project, we will evaluate the performance and cost of three different cloud models of cloud computing for implementing big data analytics systems, as well as the level of security of these models. To evaluate the performance and cost, I will implement three big data frameworks on three cloud computing models. After that, I will experiment node failure and node addition on each cloud model during the process of running several big data workloads and experiment with many case replication data to evaluate their effects on the quality attributes of cloud models. The data from these experiments will be analyzed and used to evaluate the performance and cost of these three cloud models. Lastly, I will evaluate and compare the security of three cloud models as well as three big data frameworks (i.e., Hadoop, Spark, and Flink). The result of this project can help many companies worldwide for combining their big data analytics systems with a suitable cloud computing model.
Students:
- The Trung Le
Supervisors:
- M. Ali Babar
- Faheem Ullah
ML/DL for Tuning Cloud-Enabled Big Data Systems
The volume, velocity, and variety of digital data is increasing day by day. It is expected that by 2020, the amount of digital data will reach 40 trillion gigabytes, which was merely 1.2 trillion gigabytes in 2010. Cisco estimates that the number of devices connected to the internet will increase increase from 18 billion in 2017 to 28.5 billion in 2022. These devices will generate large volumes of data. The traditional data analytics applications are unable to cope with the increasing volume, velocity, and variety of the data. Therefore, the use of big data technologies (e.g., Hadoop and Spark) is on the rise. Big data analytics requires extensive computational power, which is provided through cloud computing. Cloud computing has three different models – public, private, and hybrid. It is important to understand which cloud model should be used for implementing big data analytical solutions.
Students:
- Yeungsing Wong
Supervisors:
- M. Ali Babar
- Yaser Mansouri
- Bushra Sabir
Hypervisor-assisted VM activity introspection
Two main objectives of this project are to improve tooling for live VM introspection and evaluate various ML approaches for VM activity detection. Modern public (and private) clouds typically host multiple user-controlled VMs on the same physical hardware controlled by the same hypervisor. This poses two potential risks for end-user. Firstly, a malicious user can break out of their allocated VM and affect other VMs on the same host. Secondly, having a direct physical access, a malicious cloud owner/administrator may affect any VM running on their hardware. While these risks are known and described in literature, currently there is some lack of tools to actually perform stealth and run-time hypervisor-assisted attacks on a VM. Instead (typically in forensic analysis domain), the VMs are stopped and analyzed offline. This analysis can occur at numerous levels, such as filesystem, process and RAM. Offline analysis, however, may be not as efficient because of disk encryption and lack of live network data. Unless the encryption key is known, filesystem contents become protected from forensic analysis. Similarly, when a VM is stopped, no live network traffic can be intercepted and analyzed. Thus, implementing hypervisor-assisted online stealth analysis for running VMs would be beneficial for forensic applications.
Students:
- Siyuan Zhang
Supervisors:
- M. Ali Babar
- Victor Prokhorenko
Master of Software Engineering Projects
The impact of cloud configuration on the configuration of big data frameworks
The volume, velocity, and variety of digital data is increasing day by day. It is expected that by 2020, the amount of digital data will reach 40 trillion gigabytes, which was merely 1.2 trillion gigabytes in 2010. Cisco estimates that the number of devices connected to the internet will increase increase from 18 billion in 2017 to 28.5 billion in 2022. These devices will generate large volumes of data. The traditional data analytics applications are unable to cope with the increasing volume, velocity, and variety of the data. Therefore, the use of big data technologies (e.g., Hadoop and Spark) is on the rise. Big data analytics requires extensive computational power, which is provided through cloud computing. Both big data framework and the cloud requires configuration tuning. Big data frameworks come with many parameters (e.g., executor memory, replication factor, and data compression options), which requires to be tuned for optimal performance. Similarly, the cloud cluster has several parameters (e.g., number of nodes in the cluster, heap size, and flavour of computing nodes), which are tuned in order to use the cloud to the maximum of its potential. However, there is lack of understanding and exploration on whether there is a correlation between the configuration of big data framework and the configuration of the underlying cloud. For example, if the flavour of the nodes is small (4GB RAM), whether the replication factor of the big data framework should be tuned to 2 or 3 to achieve the most optimal response time. This project aims to investigate the configuration tuning of the big data framework with respect to the configuration of the underlying cloud. For the investigation, the project will implement three big data frameworks (i.e., Hadoop, Spark, and Flink) on our private cloud. The project will investigate the correlation between 12 big data framework configuration parameters and 7 cloud configuration parameters. The project will use three batch benchmark workloads (i.e., word count, page rank, and sort) and two iterative benchmark workloads (i.e., K-means and linear regression) for evaluation.
Students:
- Sharon Khate Damaso
- Atukoralage Kusala Rajakaruna
Supervisors:
- Faheem Ullah
Edge-based Smart Parking Solution
This project evaluates various aspects of task offloading such as energy, time, CPU utilization and network bandwidth in the context of IoT Edge processing. Smart parking system serves as the case study for the framework to be developed.
Students:
- Trung Ky Moc
- Lantian Cai
Supervisors:
- Victor Prokhorenko
Undergraduate and Honour Projects
TBA