Wubiao Xu, Xin Huang, Shiman Meng, Weiping Zhang, Luanzheng Guo, Kento Sato: An Efficient Checkpointing System for Large Machine Learning Model Training. SC Workshops 2024: 896-900