Digital archive of theses discussed at the University of Pisa


Thesis etd-01292021-184347

Thesis type
Tesi di laurea magistrale
Thesis title
Health monitoring and recovery in embedded hypervisors
Course of study
relatore Prof. Buttazzo, Giorgio C.
relatore Dott. Biondi, Alessandro
relatore Ing. Cicero, Giorgiomaria
  • Embedded Systems
  • Virtualization
  • Hypervisor
  • Mixed-criticality
  • Cyber-physical systems
  • Safety
  • Isolation
  • Monitor
  • Recovery
Graduation session start date
Release date
Software complexity in embedded systems is continuously increasing while embedded computing platforms
are becoming more and more powerful and heterogeneous to perform high-performance computations with limited power budgets. Modern embedded software systems are composed of subsystems with different levels of criticality and security, which make risky and inefficient the adoption of a single Operating System (OS) to handle all software tasks in a holistic fashion. For this reason, virtualization technology is establishing as the de-facto solution to securely and safely host mixed-criticality software on the same platform by providing a multi-domain environment in which Real-Time Operating Systems (RTOSs) may coexist, in isolation, with General Purpose OSs (e.g., Linux). Due to their large code base and software complexity, the latter are much more prone to safety and security threats with respect to RTOSs, hence calling for continuous monitoring to detect and react to possible failures of different nature.
The aim of this thesis is to design and implement hypervisor-level mechanisms to monitor failure in virtual machines (VMs) and recover failed VMs that host the Linux OS in a mixed-criticality environment. The mechanisms have been realized within CLARE-Hypervisor, a fully-static type-1 real-time hypervisor targeting cyber-physical systems on heterogeneous platforms.
The main idea is to detect Linux crashes and perform a warm reset of the corresponding VM, while keeping the entire system up such that the other VMs in the system can continue to run without experiencing any unwanted interference. The proposed monitoring technique is based on two different approaches: synchronous and asynchronous fault detection. The former is based on a direct notification to the hypervisor that the execution flow of the VM ended up in the kernel panic code section. The latter is based on a watchdog timer implemented at the hypervisor level with refresh notifications sent by the VM by means of a Linux kernel module.
Once a failure is detected, a warm reset of the failed VM is performed without directly involving the hypervisor, being the recovery procedure within the same VM context. In this way, the workload of the recovery procedure is handled as normal VM execution, thus preserving all the isolation properties configured for the VM of interest.
Experimental results are finally reported to prove the feasibility for reliable and interference-free
monitoring and recovering mechanisms for mixed-criticality cyber-physical systems equipped with the
Xilinx Ultrascale+ SoC, exhibiting a negligible impact on the boot latency, recovery time, and run-time overhead.