Fault Tolerance in a Two-State Regularity-Based Checkpointing System



Journal Title

Journal ISSN

Volume Title



Embedded real-time virtualized systems serve a wide range of functions in many industries. They can encompass multiple independent applications that must share limited computational resources. The tasks running within these applications may vary in criticality and have different timing requirements. Many models have been introduced to ensure reliability and efficiency when scheduling tasks in these systems. Models in the Hierarchical Real-time Scheduling (HiRTS) framework can enable the virtualization and sharing of resources. The Regularity-based Resource Partition model (RRP) can be used to achieve transparent scheduling for such models. Many use resource-level checkpointing with rollback recovery as a method to resolve transient faults without modifying application code. However, checkpoint insertions are known to incur high time and energy overheads. This thesis project proposes the Two-state Regularity-based Checkpointing model. This HiRTS model will ensure fault tolerance when scheduling independent, mixed-criticality real-time task sets on limited resources. By reducing checkpoint insertions before the first fault, the system will achieve a lower time overhead while still ensuring fault tolerance. Simulation-based experiments were performed using a simple implementation of the proposed scheduling model. Results indicate that the model allows independent mixed-criticality task sets to maintain real-time performance guarantees for their high-priority tasks, even under a high fault rate. In addition, results show that unaffected task sets will still not suffer delays, even if other sets are experiencing an elevated number of faults.



Real-time systems, Embedded systems, Fault tolerance, Mixed-criticality, Hierarchical scheduling