Abstract: The growth of Large Language Models (LLMs) has necessitated large-scale distributed training. Highly optimized frameworks, however, suffer significant losses in MFU (Model FLOPS Utilization) ...