A fault-tolerant aware scheduling method for fog-cloud environments

Autoři: Abdulaziz Alarifi aff001;  Fathi Abdelsamie aff002;  Mohammed Amoon aff001
Působiště autorů: Department of Computer Science, Community College, King Saud University, Riyadh, Saudi Arabia aff001;  Department of Electronics and Electrical Communications, Faculty of Electronic Engineering, Menoufia University, Menouf, Egypt aff002;  Department of Computer Science and Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf, Egypt aff003
Vyšlo v časopise: PLoS ONE 14(10)
Kategorie: Research Article
doi: 10.1371/journal.pone.0223902


Fog computing is a promising technology that leverages the resources to provide services for requests of IoT (Internet of Things) devices at the cloud edge. The high dynamic and heterogeneous nature of devices at the cloud edge causes failures to be a popular event and therefore fault tolerance became indispensable. Most early scheduling and fault-tolerant methods did not highly consider time-sensitive requests. This increases the possibility of latencies for serving these requests which causes unfavorable impacts. This paper proposes a fault-tolerant scheduling method (FTSM) for allocating services’ requests to the most sufficient devices in fog-cloud IoT-based environments. The main purpose of the proposed method is to reduce the latency and overheads of services and to increase the reliability and capacity of the cloud. The method depends on categorizing devices that can issue requests into three classes according to the type of service required. These classes are time-sensitive, time-tolerant and core. Each time-sensitive request is directly mapped to one or more edge devices using a pre-prepared executive list of devices. Each time-tolerant request may be assigned to one or more devices at the cloud edge or the cloud core. Core requests are assigned to resources at the cloud core. In order to achieve fault tolerance, the proposed method selects the most suitable fault-tolerant technique from replication, checkpointing and resubmission techniques for each request while most existing methods consider only one technique. The effectiveness of the proposed method is assessed using average service time, throughput, operation costs, success rate and capacity percentage as performance indicators.

Klíčová slova:

Clouds – Employment – Network bandwidth – Cloud computing – Fault tolerance – Electrical faults – Actuators – Computing methods


1. Ningning S. et al, “Fog computing dynamic load balancing mechanism based on graph repartitioning,” China Communications, vol. 13, 2016, pp. 156–164.

2. Varghese B. and Buyya R., “Next generation cloud computing: New trends and research directions,” Future Generation Computer Systems, vol. 79, 2018, pp. 849–861.

3. A. Noronha et al, “Attaining IoT Value: How to move from Connecting Things to Capturing Insight,” White paper, Cisco, 2014.

4. U. Ozeer et al, “Resilience of Stateful IoT Applications in a Dynamic Fog Environment,” in Proc. of the 15th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services, 5–7 Nov., New York City, USA, 2018, pp.1-10.

5. Ren S. and van der Schaar M., “Dynamic scheduling and pricing in wireless cloud computing,” IEEE Transactions on Mobile Computing, vol. 13, no. 10, pp. 2283–2292, 2014.

6. The LLOYD’s Emerging Risk Report 2018 Technology [Online]. https://www.lloyds.com/~/media/files/news-and-insight/risk-insight/2018/cloud-down/aircyberlloydspublic2018final.pdf.

7. Zhang J., “Overview on Fault Tolerance Strategies of Composite Service in Service Computing,” Wireless Communications and Mobile Computing, vol. 2018, 2018, pp. 1–8.

8. Hasan M. and Goraya M. S., “Fault tolerance in cloud computing environment: a systematic survey,” Computers in Industry, vol. 99, pp. 156–172, 2018.

9. https://aws.amazon.com/premiumsupport/knowledge-center/autoscaling-fault-tolerance-load-balancer/. Accessed Jan. 12, 2019.

10. Szpuszta M., Vaitinadin S., “Microsoft Azure—Fault Tolerance Pitfalls and Resolutions in the Cloud,” MSDN Magazine Blog, vol. 30, no. 9, 2015.

11. Amoon M., “A job checkpointing system for computational grids,” Open Computer Science, vol. 3, 2013, pp. 17–26.

12. Abdulhamid S. et al, “Fault tolerance aware scheduling technique for cloud computing environment using dynamic clustering algorithm,” Neural Computing and Applications, vol. 29, 2018, pp. 279–293.

13. Liu Y., Fieldsend J. and Min G., “A Framework of Fog Computing: Architecture, Challenges and Optimization,” IEEE Access, vol. 5, 2017, pp. 25445–25454.

14. I. Goiri, F. Julià, J. Guitart, and J. Torres, “Checkpoint-based fault-tolerant infrastructure for virtualized service providers,” in Proc. of the 12th IEEE/IFIP Network Operations and Management Symposium (NOMS’10), Osaka, Japan, 2010, pp. 455–462.

15. J. Cao et al, “Checkpointing as a Service in Heterogeneous Cloud Environments,” in Proc. of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Shenzhen, Guangdong, China, 2015, pp. 61–70.

16. Abdulhamid S. and Abd Latiff M., “A Checkpointed League Championship Algorithm-Based Cloud Scheduling Scheme with Secure Fault Tolerance Responsiveness,” Applied Soft Computing, vol. 61, 2017, pp. 670–680.

17. Louatia T., Abbesa H., Cérinb C. and Jemnia M., “LXCloud-CR: Towards LinuX Containers Distributed Hash Table based Checkpoint-Restart,” Journal of Parallel Distributed Computing, vol. 111, 2018, pp. 187–205.

18. P. Das and P. M. Khilar, “VFT: A Virtualization and Fault Tolerance Approach for Cloud Computing,” in Proc. of the 2013 IEEE Conference on Information and Communication Technologies, Thuckalay, Tamil Nadu, India, 2013, pp. 473–478.

19. A. Alhosban et al, “Self-healing Framework for Cloud-based Services,” in Proc. of the 2013 Int’l Conf. on Computer Systems and Applications, Ifrane, Morocco, 2013.

20. Saranya S. et al, “Enhanced Fault Tolerance and Cost Reduction using Task Replication using Spot Instances in Cloud,” International Journal of Innovative Research in Science, Engineering and Technology, vol. 4, 2015, pp. 12–16.

21. Zhu X. et al, “Fault-Tolerant Scheduling for Real-Time Scientific Workflows with Elastic Resource Provisioning in Virtualized Clouds,” IEEE Transactions on Parallel and Distributed Systems, vol. 27, Issue 12, Dec. 2016, pp. 3501–3517.

22. V. Souza et al, “Proactive vs. Reactive Failure Recovery Assessment in Combined Fog-to-Cloud (F2C) Systems,” in Proc. of IEEE 22nd International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD), Lund, Sweden, 2017, pp. 1–5.

23. Wang K. et al, “Adaptive and Fault-tolerant Data Processing in Healthcare IoT Based on Fog Computing,” IEEE Transactions on Network Science and Engineering, 2018, https://doi.org/10.1109/tnse.2018.2859307.

24. Dantu K., Ko S. and Ziarek L., “RAINA: Reliability and Adaptability in Android for Fog Computing,” IEEE Communications Magazine, vol. 55, 2017, pp. 41–45.

25. R. Oma et al, “Fault-Tolerant Fog Computing Models in the IoT,” in Proc. of the 13th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC-2018), October 27–29, Tunghai University, Taichung, Taiwan, pp. 14–25.

26. Amoon Mohammed, “Adaptive Framework for Reliable Cloud Computing Environment,” IEEE Access, vol. 4, 2016, pp. 9469–9478.

27. Wei Y., Qiu J., Lam H., and Wu L., ‘‘Approaches to T-S fuzzy affine-model-based reliable output feedback control for nonlinear Ito stochastic systems,” IEEE Trans. Fuzzy Syst., vol. 25, issue 3, 2017, pp. 569–583.

28. Gupta el al H., “iFogSim: A Toolkit for Modeling and Simulation of Resource Management Techniques in the Internet of Things, Edge and Fog Computing Environments,” Software: Practice and Experience, vol. 47, 2017, pp. 1275–1296.

29. J. Byrne et al, “Recap Simulator: Simulation of Cloud/Edge/Fog Computing Scenarios,” in Proc. of the 2017 Winter Simulation Conference, Las Vegas, NV, USA, 2017, pp. 4568–4569.

30. M. Lopes et al, “MyiFogSim: A Simulator for Virtual Machine Migration in Fog Computing,” in Proc. of the10th International Conference on Utility and Cloud Computing, Austin, Texas, USA, 2017, pp. 47–52.

Článek vyšel v časopise


2019 Číslo 10