APOLLO: Memory Efficient LLM Training Artwork

AI Signals From Tomorrow

Signals from Tomorrow is a podcast channel designed for curious minds eager to explore the frontiers of artificial intelligence. The format is a conversation between Voyager and Zaura discussing a specific scientific paper or a set of them, sometime in a short format and sometime as a deep dive.

Each episode delivers clear, thought-provoking insights into how AI is shaping our world—without the jargon. From everyday impacts to philosophical dilemmas and future possibilities, AI Signals from Tomorrow bridges the gap between cutting-edge research and real-world understanding.

Whether you're a tech enthusiast, a concerned citizen, or simply fascinated by the future, this podcast offers accessible deep dives into topics like machine learning, ethics, automation, creativity, and the evolving role of humans in an AI-driven age.

Join Voyager and Zaura as they decode the AI signals pointing toward tomorrow—and what they mean for us today.

All Episodes

AI Signals From Tomorrow

APOLLO: Memory Efficient LLM Training

June 03, 2025 • 1az

The podcast describes APOLLO and APOLLO-Mini (based on the paper https://arxiv.org/pdf/2412.05270) , which are memory-efficient optimizers designed to overcome the significant memory demands of training Large Language Models (LLMs) with standard optimizers like AdamW. AdamW requires a lot of memory, limiting the size of models or the speed of training. APOLLO addresses this by changing how learning rates are updated, using a structured format and approximating the scaling with a low-rank auxiliary state based on random projection to save memory, avoiding computationally expensive operations. APOLLO-Mini is an even more memory-efficient variant, using a rank-1 auxiliary space.

The research indicates that these optimizers achieve memory costs similar to simpler methods while performing as well as or better than AdamW in training LLMs. The stated benefits include increased throughput, the ability to train larger models on existing hardware, and making it possible to train LLMs on lower-end GPUs, particularly when combined with quantization.

Support the show