description: With the raise of Big Data problems, Big Data processing, and Big Data frameworks, several general architectures have been proposed with Lambda and Kappa architecture being the most prominent. The goal of this seminar topic is to summarise the challenges that come with Big Data (processing) and introduce the two architectures as well as their major differences. It shall also present an outlook on successors of the Kappa architecture.
Topic 02: Anomaly Detection in Time-Series Data, An Overview
description: In the scope of this work, techniques for analysis and anomaly detection in time-series data shall be explored. An overview over the current state-of-the-art shall be presented, and the advantages and disadvantages of numerous techniques presented.
description In September 2017 the Java Developer Kit has been released in its 9th version. This topic has the goal to get an overview on the new features of JDK 9 on both library as well as language (syntactic) level. Special focus shall be put on the new module system jigsaw.
description State machine replication is a common approach to ensure reliability of a (remote) service with no fail-over time in case of failures ,. It has been proposed by Leslie Lamport in the 1970s and has been a topic of research ever since. Besides some use in specialised domains
such as aviation, it, however, lacks wide-spread use due to complex implementation and very high overhead due to sequential execution of requests. This situation can be improved by applying deterministic scheduling to state machine implementations . This seminar thesis shall present
the state machine concept and describe its shortcomings. It shall also present the idea of deterministic scheduling and how this solves some of the problems. For master students, it is expected that two deterministic scheduling algorithms are researched, presented, and compared.
description In the last decade, the software engineering world has seen a boost of orchestration tools and platforms (Docker, Ansible, Puppet), but also containers (Docker, Singularity), or others (Heat, Murano). This thesis shall provide
an overview over existing tools and particularly language formats and lifecylce support these environments provide. Master students should additionally propose an abstraction layer for them.
Topic 06: The consensus problem in distributed computing
description: Consensus algorithms have a long tradition in theoretical and also applied computing. This thesis shall introduce the general problem of distributed consensus and present two famous consensus algorithms.
description: The evolvement of cloud computing has led as well to changes in the database landscape. While traditional relational databases were deployed in a monolithic set up on dedicated hardware, recent database systems such as NoSQL systems promise high performance, scalability and elasticity by running on commodity hardware in the cloud. Yet, NoSQL sacrifice consistency in order to achieve scalability and elasticity. An even more recent type of database system, namely NewSQL databases, promise to be as scalable and elastic as NoSQL databases and still offer strong consistency. One of the first NewSQL databases was [Google Spanner] (http://dl.acm.org/citation.cfm?id=2491245) and open source representants such as VoltDB or CockroachDB followed. The term NewSQL and its capabilities is already discussed in the research community, but not yet classified. The object of this topic is to provide a classification of the term NewSQL and its correlation to NoSQL. A starting point is provided by this paper paper.
Topic 08: Storage solutions for time series sensor data
description: The evolvement of IoT led to tremendous increase of sensor data, which needs to be persisted and aggregated for further processing in the back ends, which typically run in the cloud. As sensor data comprises typically a timestamp, the actual sensor data and optional metadata, new storage solutions have evolved focusing exactly such kind of data. These storage solutions are commonly referred to a Time Series Databases (TSDBs) and common representants are InfluxDB, Prometheus, OpenTSDB or Druid. Yet, the architecture of these databases differs significantly and thorough evaluations of such TSDBs are rare. In the scope of this topic an overview of TSDB storage solutions should be provided based on identifying challenges and analysing existing TSDB storage solutions.
The rise of cloud computing requires a change in the way business applications are deployed to their infrastructure. The sheer amount of possible configurations and the error-proneness of manual deployments necessiate deployment and management automation. A common approach to achieve this kind of automation is the use of model driven approaches, were the user starts the deployment with model describing his application which is afterwards automatically deployed to the target (cloud) infrastructure and furthermore managed during the runtime.
The Topology and Orchestration Specification for Cloud Applications (TOSCA)  is an OASIS standard for describing cloud applications in a platform independent way. Around TOSCA an ecosystem has evolved offering tool support for managing applications described using TOSCA, e.g. OpenTOSCA , Cloudify  or Alien4Cloud .
The target of this topic is to provide an overview of the concepts of the TOSCA modeling language and its tool support by analyzing the offered features, but also their adherence to the TOSCA standard and cross-tool features like interoperability.
While Cloud Computing, due to its nearly unlimited on-demand resources, allows unhindered adaptation of one’s application to the end user’ needs, this holds only true of the application is designed and programmed in a way that it can take advantage of those resources. One architecture style, achieving the therefore required loose coupling between application components is Representational State Transfer (REST) . To increase the performance and reduce the amount of data transferred for the ever increasing number of mobile devices, Facebook recently published GraphQL  as open-source software. The task of this topic is to give a brief introduction of the GraphQL fundamentals, to compare it to traditional REST by focusing on the import aspects of loose coupling, implementation complexity and performance. Additionally it should be researched if real-work implementations for well-established programming languages exist.
To ease the management of large and distributed data centres, the term of data centre operating systems has arisen. They provide resource management features for clusters of computers similar to what a "classic" operating system does for a single computer. Therefore such systems provide a scheduling system where the jobs/tasks of the user's are mapped to the available resources. The task of this topic is to research the principle behind the term data centre operating system and to provide an overview of existing implementations.
The following resources may provide a good starting point:
Amid ever increasing data volumes, IT loads and service level requirements
that have to be guaranteed in data center operations, new challenges arise
continuously that impede high availability of the offered services
in the data center. One of these challenges is the increasing energy demand of the
IT-components. In order to keep costs low and stay competitive in the business world,
data center operators have to consider more and more the acquisition of energy efficient
hardware and intelligent solutions on hardware and software side to save energy. The
task of this seminar topic is to research the various possibilities that exist today,
that can be employed, to save energy on IT level in data center operations. Keywords
are e.g. thin provisioning, data deduplication, consolidation of network equipment etc.
Besides a summary of the options, it is expected, that a qualitative and where
possible also a quantitative evaluation of the "energy-saving" option is done.
Following reference could be used as a starting point:
description: The evolvement of Big Data and IoT led to heterogeneous landscape of modern (distributed) database management systems (DBMS). These modern DBMS are typically classified based on their data model into NoSQL and NewSQL DBMS. NoSQL and NewSQL DBMS promise to cater for non-functional features such as scalability, elasticity and high availability. Yet, the quantitative evaluation of these non-functional features is challenging, especially in the context of the new application domains Big Data and IoT. Hence, the scope of this topic is to provide an overview of recent advances in Big Data and IoT workloads to evaluate non-functional features of modern DBMS. The overview should provide a summary and classifiation of the analysed workloads. The following reference could be used as starting point:
description: The evolvement of Big Data and IoT led to heterogeneous landscape of modern (distributed) database management systems (DBMS). These modern DBMS are typically classified based on their data model into NoSQL and NewSQL DBMS. NoSQL and NewSQL DBMS promise to cater for non-functional features such as performance, scalability, elasticity and high availability. As the DBMS landscape is constantly evolving and the DBMS extend and improve their non-functional feature set, regular quantitative evaluations of the DBMS is required. The scope of this topic is to provide an overview of recent evaluations of NoSQL and NewSQL DBMS in the context of Big Data and IoT. The overview focus on up-to-date publications (2016 and newer) and classify the evaluations based on their workload and the evaluated non-functional features. The following reference could be used as starting point:
description: With the rise of large scale data centres, the amount of data to monitor the resources and their utilisation increases as well. Tons of monitoring data is being processed from its sensors on servers, switches, routers, etc.
Time Series Databases store measurements and allow querying, aggregating and visualize the monitoring metrics by system administrators. The purpose and main intention is finding bottlenecks and analyse the overall data centre utilisation. In order to automatise resource scheduling, placement or alerting of such monitoring data, a transformation into profiles is beneficial. These profiles limit the dimensions of the raw time series data, which allows optimisers and simulators to work with the data.
description: In the scope of this work, measures for estimating predictability, seasonality and "quality" of time-series data shall be explored. An analysis different possibilities shall be presented, and ideally, the most suitable techniques summarised based on the state-of-the-art.
Topic 17: Cloud/Edge Environments for Autonomous Driving, is it a thing?
description: This work offers several possible research topics, ranging from surveing the state of the art, to evaluations of computation offloading and remote training routines. A more concrete topic and strategy can be composed with the supervisor.
Topic 18: A comparative analysis of IoT programming environments
description: IoT environments are growing larger and are hence getting more and more complex. In consequence, the complexity of their implementation grows as well. The goal of this thesis topic is to discuss and compare existing IoT programming (and ideally runtime) environments based on established metrics and scientific as well as industrial metrics.
Topic 19: An analysis of Optimisation algorithms for resource allocation in Cloud Data Centres
description: A data centre operator aims to maximise it’s resource utilisation while achieving this with minimum number of servers, energy and resources to increase profit. Users aim to optimise on Cloud application level by scaling up and down according to user demands. Another aspect of optimization can be seen in cross data centre solutions, where Cloud applications are distributed across several vendors in order to avoid vendor lock-in and increase reliability.
The placement and migration problem of VMs in Cloud data centre is illustrated as a framework based on mathematical optimisation and objective function minimisation. VM allocation issue is considered as an NP-complete problem as we need to find out combinatorial optimisation in order to achieve the targets. The key challenge for an optimisation algorithm is to deliver a good solution in very short time. Furthermore, the found configuration must be stable enough to sustain sufficiently long to avoid continuous migration actions of VMs across the data centre.
There are a variety of optimisation algorithms, such as Convex optimisation problem, Minimum K cut/balanced Minimum K cut, Knapsack problem and Metaheuristic-based approaches such as Ant Colony Optimisation (ACO), Simulated Annealing (SA) etc. It is important to select an optimization technique which can be used for the validation of any resource allocation algorithm in a data centre before it's real deployment.
The topic requires an in-depth study on the optimisation algorithms/techniques which are currently being used in placing VMs in the cloud infrastructure. The student should consider the following questions.
What makes the algorithms applicable into VM placement scenario?
What are their pros and cons?
How optimal do the placements become with the use of those algorithms?
The ultimate goal is to find a set of best suited algorithms which can be used in a simulation environment to validate any VM placement algorithm.
description: Over the last years, the adoption of Cloud Infrastructure has not only increased in numbers but more and more resource demanding, and business critical applications such as High Performance Computing (HPC) simulations or Data intensive computing (DIC) applications are moved from dedicated infrastructure towards shared Cloud based solutions. These applications require large amount of compute and storage resources and are often executed as distributed or even parallel applications involving significant amount of communication with low latency among the hosted Virtual Machines. In order to handle the heavy network traffic load stimulated by those resource-demanding applications and to ensure their scalability and better performance in cloud environments, an intelligent and efficient resource allocation algorithm is necessary.
A common approach is to define a set of per-defined flavors with per-defined number of virtual CPUs and virtual memory capacity as static parameters. When a user has chosen a specific flavor, the deployment algorithm searches for the first fitting or, randomly selected host that meet the demand in terms of free memory and virtual CPUs below the maximum allowed overbooking factor. Partially, storage capacity and type are considered as well in selecting an appropriate host. Other parameters such as network load or, usage pattern are commonly not considered as decision parameter, as it is granted to be sufficiently addressed by over-provisioning of bandwidth in the network. However, network should be taken into account while designing a resource allocation algorithm in a cloud data centre to minimize network traffic and to guarantee optimal usage of compute and network resources along with reduction in energy and cost.
In recent years, research and study on network-aware VM placement and migration schemes have been emerged. The seminar topic needs an intensive study on the past and present algorithms and techniques for VM placement and migration in cloud data centres. The following questions need to be answered.
What network properties have been considered in those algorithms?
What are the placement type and constraints?
What are the objective of those algorithms?
What are their evaluation performance metrics?
How effective are the algorithms?
The student should make a comparison among the placement and migration algorithms to find out the most fitting algorithm for today’s large and highly scalable cloud data centres.
description: With the advancement of Cloud computing, a new era has begun where a significant number of applications of different kinds are being migrated to Cloud data centres. Among those applications, some stimulate heavy amount of communication with low latency among the hosted Virtual Machines. To determine the actual network resource demand of such applications and how they impact the performance of the network, it is very essential to model their communication behaviour. A widely used method is to monitor and evaluate the network traffic to determine and also forecast the real network resource utilisation of the applications. Network traffic models such as Poisson Distribution Model, Markov and Embedded Markov Models, Pareto Distribution Process can be used to understand both the network and the applications using the network better.
The seminar topic requires a comprehensive study on the well known traffic models to find out their features, pros and cons with respect to their use in the cloud data centre network. The ultimate goal is to find a set of best suited traffic models which can represent the communication behaviour of the applications more accurately in such networks.