(MENAFN- H+K Strategies) While a great deal of attention and information has recently focused on generative artificial intelligence and the powerful and disruptive possibilities it represents, an important aspect of the technology that bears investigation and understanding is the hardware necessary to facilitate the transformational end results.

Excitement about the possibilities of generative AI has led to a significant uptick in demand for computing power for generative AI applications and their underlying models. In fact, in the broad market, there is now a shortage of GPUs, chips that are critical for training and running many ML models.

Not only is this kind of computing power now scarce, even if companies can get access to the chips, it can be incredibly expensive to build and use generative AI models and applications. These models rely on billions or trillions of parameters and require massive amounts of data to run and train properly, which can be cost-prohibitive.

AWS has been working to democratize access to generative AI in a number of ways. It is introducing tools like Amazon Bedrock, which allows developers to access foundation models through an API. It has also been designing custom silicon specifically built to train machine learning models and run them so they can draw inferences. These purpose-built chips offer superior price-performance and make it quicker, less costly and less energy-intensive to train and run models that power generative AI.

AWS works with major partners including NVIDIA, Intel, and AMD to offer the broadest set of accelerators in the cloud for machine learning and generative AI applications. AWS recently announced that Amazon Elastic Compute Cloud (Amazon EC2) P5 instances, tailored for AI and ML workloads, will be powered by the latest NVIDIA H100 Tensor Core GPUs. AWS has also invested significantly over the last several years to build AI- and ML-focused chips in-house, including AWS Trainium and AWS Inferentia.

AWS Trainium chips help reduce machine learning training costs

It can take months and cost tens of millions of dollars to train foundation generative AI models, with hundreds of billions of parameters. That’s why AWS introduced AWS Trainium, which is specifically designed to speed up and lower the cost of training machine learning models by up to 50 percent. Each Trainium accelerator includes two second-generation NeuronCores that are purpose built for deep learning algorithms.

AWS Inferentia chips help accelerate the deployment of generative AI applications

In 2018, AWS introduced AWS Inferentia, its first purpose-built chip for conducting AI and ML which is the process by with AI applications make predictions and decisions in real-time. The chips power Amazon EC2 Inf1 instances designed to provide high performance and cost efficiency for deep learning model inference workloads. AWS Inferentia2 chips, the second generation, deliver up to four times higher throughput and up to 10 times lower latency than first-generation Inf1 chips. Earlier this year, AWS introduced Inferentia2-based Inf2 instances, which deliver up to 40 percent better price performance than other comparable Amazon EC2 instances when deploying generative AI models.

What’s key is that the Inferentia chips do all this while delivering high performance, including exceptionally high throughput and low latency. This allows customers to run generative AI applications and receive recommendations or generate content nearly instantly, which is valuable for anyone using a generative AI application.

Running generative AI applications is also highly energy-intensive, and AWS and its customers also increasingly see sustainability as an important consideration. That’s why AWS designed Inf2 instances to deliver better performance per watt over other comparable instances in Amazon EC2, in this case 50 percent better.

Johnson & Johnson, OctoML see AI benefits from purpose-built AWS chips

Johnson & Johnson, like other pharmaceutical companies, is subject to strict regulatory requirements. AI and ML tools help it analyze regulatory compliance data by extracting relevant information to help the company quickly identify potential compliance issues and ensure it is adhering to regulatory standards.

Johnson & Johnson is also focused on sustainability, Menon says, and notes that advanced and energy-efficient innovations in AI and ML silicon, along with other advanced architectural energy-saving tactics, allows the company to make using AI sustainable and meet its commitments to address climate change.

Some companies that are using customized AI and ML silicon from AWS are doing so to help their customers gain access to generative AI models. Through its OctoAI service, OctoML gives developers access to a library of some of the fastest and most cost-effective open-source foundational models available today, including the fastest Stable Diffusion endpoint in the market and the 65 billion-parameter LLaMA language model.

Why purpose-built chips are important when choosing a cloud provider

Improving the price-performance for compute-intensive workloads while delivering high energy efficiency and ease of use is key to ensuring that more customers can realize the full promise of generative AI technology.

Experts at research firms such as Forrester Research suggest that organizations should consider a cloud service provider’s complete technology stack, as that is where much of the innovation occurs in the public cloud market and where different players’ level of investment varies the most.

One of the leading factors Forrester recommends organizations consider is purpose-built silicon that cloud providers like AWS are producing. Increasingly, hardware innovation, and silicon in particular, is serving as a differentiating factor for cloud providers and allowing organizations to pursue new use cases.





MENAFN22042024007422016029ID1108123065