AWS introduces Parallel Computing Service for HPC

Parallel Computing

Amazon Web Services (AWS) has introduced the AWS Parallel Computing Service (PCS), a new managed service designed to help customers run high-performance computing (HPC) workloads at virtually any scale. This launch aims to streamline the management and operation of HPC clusters, allowing scientists, researchers, and engineers to focus on their simulations and computational tasks without worrying about the underlying infrastructure. AWS PCS simplifies the process of setting up and managing HPC environments.

The new service leverages Slurm, a highly scalable and fault-tolerant job scheduler, widely used in the HPC community for scheduling and orchestrating simulations. This fully managed service eliminates the operational burden often associated with HPC environments, providing a more efficient way to run complex computational tasks. Ian Colle, director of advanced compute and simulation, explained that this kind of access may accelerate the pace of innovation for technology or scientific discovery that traditionally rely on access to HPC clusters.

“There are a number of existing workloads today that really should be or could be taking advantage of high-performance computing resources, but because of the perception that it’s only for large enterprises or labs, whether real or perceived, is too much that people go, you know what, I don’t even want to go there,” Colle said. He believes that will change once companies realize they can use HPC clusters more easily with the new service, enabling more experimentation. Parallel Computing allows users to set up and manage groups of Elastic Compute Cloud instances.

The company tapped open-source HPC workload manager Slurm to build and maintain the clusters for system administrators. Previously, customers were offered access to HPC clusters but were required to provide their own system administrators to maintain the network. Now, customers can run scientific and engineering workloads at scale using tools such as the Management Console and software development kits.

Since the service uses Slurm, users can migrate any existing workflows to the HPC cluster without rearchitecting anything.

Introducing AWS Parallel Computing Service

Enterprises can also connect any APIs.

Colle stated that the offering “simplifies cluster administration and unlike other products, customers can completely offload Slurm management” to the service. The service will first be available in regions in Ohio, Northern Virginia, and Oregon in the United States; Frankfurt, Stockholm, and Ireland in Europe; and Sydney, Singapore, and Tokyo in the Asia-Pacific. The demand for HPC clusters has grown as companies need access to compute power to train large language models and other AI foundation models.

HPC networks target not just large calculations needed for drug discoveries. Previously, large government labs and very big companies had access to supercomputers. Companies like AMD, Intel, Nvidia, and IBM competed for government and scientific clients.

With more companies interested in using HPC clusters, cloud providers like Google, Microsoft Azure, and Penguin Computing on Demand offer access to these powerful servers to clients. Gartner Analyst and Senior Director Tony Harvey commented that HPC-as-a-service is nothing new, but more kinds of companies are finding new use cases for supercomputers. “I suspect we will see more competition in the space.

A lot of the companies already offer HPC access, and there are even some that offer novel ways to access GPUs and servers because HPC use has gotten into everything, not just AI,” Harvey said. He added that any move that further democratizes access to HPCs reduces the waiting list for large supercomputers and increases the value of the time for those running experiments and predictions. This new service is set to change how enterprises use HPC, making it more accessible and easier to manage, thus enabling innovation and experimentation across various fields.