AI Signals From Tomorrow

APOLLO: Memory Efficient LLM Training

1az

The podcast describes APOLLO and APOLLO-Mini (based on the paper https://arxiv.org/pdf/2412.05270) , which are memory-efficient optimizers designed to overcome the significant memory demands of training Large Language Models (LLMs) with standard optimizers like AdamW. AdamW requires a lot of memory, limiting the size of models or the speed of training. APOLLO addresses this by changing how learning rates are updated, using a structured format and approximating the scaling with a low-rank auxiliary state based on random projection to save memory, avoiding computationally expensive operations. APOLLO-Mini is an even more memory-efficient variant, using a rank-1 auxiliary space.

The research indicates that these optimizers achieve memory costs similar to simpler methods while performing as well as or better than AdamW in training LLMs. The stated benefits include increased throughput, the ability to train larger models on existing hardware, and making it possible to train LLMs on lower-end GPUs, particularly when combined with quantization.

Support the show