From Chips To Clusters: Scaling AI Efficiently By Design
| FLOPS | HBM BW | Scale-up BW | |
| Training | 70.0% | 10.0% | 20.0% |
| Inference | 45.0% | 32.5% | 22.5% |
Performance and power use per rack together imply trends in performance per watt over time for training and inference, then indices for progress in training and inference are weighted 50:50 to get the final estimate of AMD projected progress by 2030 (20x). The performance number assumes continued AI model progress in exploiting lower precision math formats for both training and inference which results in both an increase in effective FLOPS and a reduction in required bandwidth per FLOP.
We commissioned Dr. Koomey to analyze historical industry data and projected AMD data on compute performance and power consumption. We then worked with Dr. Koomey to develop a goal methodology aligned with industry-accepted best-practices for efficiency assessments. This methodology allows us to compare our goal to historical industry gains, track our progress against the goal over time, and to estimate environmental benefits of achieving the goal in real world AI deployment.
[iii] AMD estimated the number of racks to train a typical notable AI model based on EPOCH AI data (). For this calculation we assume, based on these data, that a typical model takes 1025 floating point operations to train (based on the median of 2025 data), and that this training takes place over 1 month. FLOPs needed = 10^25 FLOPs/(seconds/month)/Model FLOPs utilization (MFU) = 10^25/(2.6298*10^6)/0.6. Racks = FLOPs needed/(FLOPS/rack in 2024 and 2030). The compute performance estimates from the AMD roadmap suggests that approximately 276 racks would be needed in 2025 to train a typical model over one month using the MI300X product (assuming 22.656 PFLOPS/rack with 60% MFU) and <1 fully utilized rack would be needed to train the same model in 2030 using a rack configuration based on an AMD roadmap projection. These calculations imply a >276-fold reduction in the number of racks to train the same model over this six-year period. Electricity use for a MI300X system to completely train a defined 2025 AI model using a 2024 rack is calculated at ~7GWh, whereas the future 2030 AMD system could train the same model using ~350 MWh, a 95% reduction. AMD then applied carbon intensities per kWh from the International Energy Agency World Energy Outlook 2024 []. IEA's stated policy case gives carbon intensities for 2023 and 2030. We determined the average annual change in intensity from 2023 to 2030 and applied that to the 2023 intensity to get 2024 intensity (434 CO2 g/kWh) versus the 2030 intensity (312 CO2 g/kWh). Emissions for the 2024 baseline scenario of 7 GWh x 434 CO2 g/kWh equates to approximately 3000 metric tC02, versus the future 2030 scenario of 350 MWh x 312 CO2 g/kWh equates to around100 metric tCO2.
[iv]
[v] Regression analysis of achieved accuracy/parameter across a selection of model benchmarks, such as MMLU, HellaSwag, and ARC Challenge, show that improving efficiency of ML model architectures through novel algorithmic techniques, such as Mixture of Experts and State Space Models for example, can improve their efficiency by roughly 5x during the goal period. Similar numbers are quoted in Patterson, D., J. Gonzalez, U. Hölzle, Q. Le, C. Liang, L. M. Munguia, D. Rothchild, D. R. So, M. Texier, and J. Dean. 2022. "The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink." Computer. vol. 55, no. 7. pp. 18-28.” Therefore, assuming innovation continues at the current pace, a 20x hardware and system design goal amplified by a 5x software and algorithm advancements can lead to a 100x total gain by 2030.
Legal Disclaimer:
MENAFN provides the
information “as is” without warranty of any kind. We do not accept
any responsibility or liability for the accuracy, content, images,
videos, licenses, completeness, legality, or reliability of the information
contained in this article. If you have any complaints or copyright
issues related to this article, kindly contact the provider above.

Comments
No comment