Abstract
For the purpose of high performance computation, several machines are developed at an exascale level. These machines can perform at least one exaflop calculations per second, which corresponds to a billion billon or 108. The universe and nature can be understood in a better manner while addressing certain challenging computational issues by using these machines. However, certain obstacles are faced by these machines. As huge quantity of components is encompassed in the exascale machines, frequent failure may be experienced and also the resilience may be challenging. High progress rate must be maintained for the applications by incorporating certain form of fault tolerance in the system. Power management has to be performed by incorporating the system in a parallel manner. All layers inclusive of fault tolerance layer must adhere to the power limitation in the system. Huge energy bills may be expected on installation of exascale machines due to the high power consumption. For various fault tolerance models, the energy profile must be analyzed. Parallel recovery, message-logging, and restart or checkpoint fault tolerance models for rollback recovery are evaluated in this paper. For execution with failure, the most energy efficient solution is provided by parallel recovery when programs with various programming models are used. The execution is performed faster with parallel recovery when compared to the other techniques. An analytical model is used for exploring these models and their behavior at extreme scales.
References
- Beechu, N. K. R., Harishchandra, V. M., & Balachandra, N. K. Y. (2017). High-performance and energy-efficient fault-tolerance core mapping in NoC. Sustainable Computing: Informatics and Systems, 16, 1-10.
- Karuppusamy, Dr P. "Performance Analysis of Multiple Pico Hydro Power Generation." Journal of Electrical Engineering and Automation 2, no. 2: 92-101.
- Bautista-Gomez, L., Tsuboi, S., Komatitsch, D., Cappello, F., Maruyama, N., & Matsuoka, S. (2011, November). FTI: High performance fault tolerance interface for hybrid systems. In Proceedings of 2011 international conference for high performance computing, networking, storage and analysis (pp. 1-32).
- Vijayakumar, T., and Mr R. Vinothkanna. "Efficient Energy Load Distribution Model using Modified Particle Swarm Optimization Algorithm." Journal of Artificial Intelligence 2, no. 04 (2020): 226-231.
- Ansari, M., Salehi, M., Safari, S., Ejlali, A., & Shafique, M. (2020). Peak-Power-Aware Primary-Backup Technique for Efficient Fault-Tolerance in Multicore Embedded Systems. IEEE Access, 8, 142843-142857.
- Kamel, Khaled, and Eman Kamel. "Process Control Ladder Logic Trouble Shooting Techniques Fundamentals." IRO Journal on Sustainable Wireless Systems 1, no. 4 (2019): 206-241.-1
- Jahanpour, H., Barati, H., & Mehranzadeh, A. (2020). An Energy Efficient Fault Tolerance Technique Based on Load Balancing Algorithm for High-Performance Computing in Cloud Computing. Journal of Electrical and Computer Engineering Innovations (JECEI), 8(2), 169-182.
- Wang, Haoxiang. "Flexibility Management in Renewable Energy Source Operated Power Systems using Decision Support System." Journal of Electrical Engineering and Automation 2, no. 1 (2020): 35-42.
- Meneses, E., Sarood, O., & Kalé, L. V. (2012, October). Assessing energy efficiency of fault tolerance protocols for HPC systems. In 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing (pp. 35-42). IEEE.
- Sathesh, A. "Assessment of Environmental and Energy Performance Criteria for Street Lighting Tenders using Decision Support System." Journal of Electronics and Informatics 2, no. 2: 72-79.
- Goundar, S., & Bhardwaj, A. (2018). Efficient fault tolerance on cloud environments. International Journal of Cloud Applications and Computing (IJCAC), 8(3), 20-31.
- Karuppusamy, Dr P. "Synchronization of Reactive Power in Solar Based DG and Voltage Regulated Elements Using Stochastic Optimization Technique." Journal of Electrical Engineering and Automation 2, no. 1 (2020): 50-59.
- Yu, S., Tang, Z., Ye, X., Zhang, Z., Fan, D., & Jiang, Z. (2018, December). High-Performance and Energy-Efficient Fault Tolerance Scheduling Algorithm Based on Improved TMR for Heterogeneous System. In 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom) (pp. 207-214). IEEE.
- Losada, N., González, P., Martín, M. J., Bosilca, G., Bouteiller, A., & Teranishi, K. (2020). Fault tolerance of MPI applications in exascale systems: The ULFM solution. Future Generation Computer Systems, 106, 467-481.
- Bansal, Malti, Harmandeep Singh, and Gaurav Sharma. "A Taxonomical Review of Multiplexer Designs for Electronic Circuits & Devices." Journal of Electronics 3, no. 02 (2021): 77-88.
- Wang, K., Louri, A., Karanth, A., & Bunescu, R. (2019, March). High-performance, energy-efficient, fault-tolerant network-on-chip design using reinforcement learning. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE) (pp. 1166-1171). IEEE.
- Bashar, Abul, and S. Smys. "Integrated Renewable Energy System for Stand-Alone Operations with Optimal Load Dispatch Strategy." Journal of Electronics 3, no. 02 (2021): 89-98.
- Hengjinda, P., Joy Iong Zong Chen, and Joy Iong Zong. "Renewable Energy Production from Agricultural Waste and Hydrogen Battery Formation." Journal of Electrical Engineering and Automation 2, no. 4: 151-155.
- Ranganathan, Dr G. "Energy Storage Capacity Expansion of Microgrids for a Long-Term." Journal of Electrical Engineering and Automation 3, no. 1 (2021): 55-64.
- Rai, Ashok Kumar, and A. K. Daniel. "An Energy-Efficient Routing Protocol Using Threshold Hierarchy for Heterogeneous Wireless Sensor Network." In Intelligent Data Communication Technologies and Internet of Things: Proceedings of ICICI 2020, pp. 553-570. Springer Singapore, 2021.
- Sampaio, A. M., & Barbosa, J. G. (2018). A comparative cost analysis of fault-tolerance mechanisms for availability on the cloud. Sustainable Computing: Informatics and Systems, 19, 315-323.
- Velu, Karthika, Pramila Arulanthu, and Eswaran Perumal. "Energy Reduction Stratagem in Smart Homes Using Association Rule Mining." In International Conference on Innovative Data Communication Technologies and Application, pp. 188-193. Springer, Cham, 2019.
- Chen, C. A., Won, M., Stoleru, R., & Xie, G. G. (2014). Energy-efficient fault-tolerant data storage and processing in mobile cloud. IEEE Transactions on cloud computing, 3(1), 28-41.
- Balasubramanian, M., V. Rajamani, and S. Puspha. "Enhancing Spectrum Efficiency and Energy Harvesting Selection for Cognitive Using a Hybrid Technique." In International Conference on Inventive Computation Technologies, pp. 556-568. Springer, Cham, 2019.
- Li, S., Li, H., Liang, X., Chen, J., Giem, E., Ouyang, K., ... & Chen, Z. (2019, November). FT-iSort: efficient fault tolerance for introsort. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1-17).
- Shafique, M., Rehman, S., Kriebel, F., Khan, M. U. K., Zatt, B., Subramaniyan, A., ... & Henkel, J. (2016). Application-guided power-efficient fault tolerance for H. 264 context adaptive variable length coding. IEEE Transactions on Computers, 66(4), 560-574.
- Karthikeyan, M. M., and G. Dalin. "Dynamic Congestion Control Routing Algorithm for Energy Harvesting in MANET." In Inventive Computation and Information Technologies, pp. 15-25. Springer, Singapore, 2021.
- van Dam, H. J., Vishnu, A., & De Jong, W. A. (2011). Designing a scalable fault tolerance model for high performance computational chemistry: A case study with coupled cluster perturbative triples. Journal of chemical theory and computation, 7(1), 66-75.
- Sivapriyan, R., D. Elangovan, and Kavyashri SN Lekhana. "Review of Python for Solar Photovoltaic Systems." In Evolutionary Computing and Mobile Sustainable Networks, pp. 103-112. Springer, Singapore, 2021.
