Azure Batch

 

Azure Batch


As the solution architect for the engineering organization, you need to understand the options available for batch processing and high-performance computing (HPC) on Azure. This knowledge helps to determine how you can efficiently render the 3D models of the facilities that the company designs, and how you store all of the related statistical data.

What is HPC?

There are many different industries that require very powerful computing resources for specialized tasks. For example:

  • In genetic sciences, gene sequencing.
  • In oil and gas exploration, reservoir simulations.
  • In finance, market modeling.


Azure Batch

Azure Batch is a service for working with large-scale parallel and computationally intensive tasks on Azure. Unlike the other options you'll see in this module, Batch is a managed service. You provide data and applications, and you specify whether to run on Windows or Linux, how many machines to use, and what rules apply to autoscaling. Batch handles provisioning of the compute capacity and optimizes the way the work is done in parallel. You only pay for the underlying compute, networking, and storage you use. The Batch scheduling and management service is free.


What is HPC?

There are many different industries that require very powerful computing resources for specialized tasks. For example:

  • In genetic sciences, gene sequencing.
  • In oil and gas exploration, reservoir simulations.
  • In finance, market modeling.
  • In engineering, physical system modeling.
  • In meteorology, weather modeling.

These tasks require processors that can carry out instructions extremely fast. It's also helpful to run many processors in parallel, to obtain answers within a practical time duration. On-premises HPC systems have many powerful CPUs and, for graphics-intensive tasks, GPUs. They also require fast disks and high-speed memory.


Components of Azure Batch

Batch has several components that act together. An Azure Batch Account forms a container for all of the main Batch elements. Within the Batch account, you typically create Batch pools of VMs, which are often called nodes, running either Windows or Linux. You set up Batch jobs that work like logical containers with configurable settings for the real unit of work in Batch, known as Batch tasks. This unit of work is highly flexible and can run either command-line instructions or entire applications. Optionally, you might associate an Azure Storage account with the Azure Batch account. You then upload and download data inputs and outputs. You also provide application installers for the Batch tasks that need them.

This diagram shows a client application or hosted service interacting with Batch to upload input, create jobs, monitor tasks, and download output.

Diagram of the components of Batch



Azure VM HPC instances

The 3D models used in your engineering organization require many calculations to render and use memory resources intensively. You find that standard virtual machines (VMs) render these models relatively slowly. These delays affect the productivity of your engineers, and you'd like to avoid them.

For high-intensity tasks with specialized requirements, you might need to use specialized VMs. Here, you'll learn about VM tiers in Azure that support specialized, high-performance tasks.


H-series VMs

Azure H-series VMs are a family of the most powerful and fastest CPU-based VMs on Azure. These VMs are optimized for applications that require high CPU frequencies or large amounts of memory per core. The basic H-series is well suited to genomic research, seismic and reservoir simulation, financial risk modeling, and molecular modeling.

The VMs feature the Intel Xeon E5-2667 v3 Haswell 3.2 GHz CPU with DDR4 memory. Configurations range from 8 cores and 56 GB at the lower end (the H8 SKU) to 16 cores and 224 GB at the higher end (the H16m SKU).

You can use all of these HPC instances with Azure Batch. When you set up a Batch pool, you can specify that H-series VMs should be used.

HB-series VMs

HB-series VMs specifically target applications requiring extreme memory bandwidth, particularly fluid dynamics, explicit finite element analysis, and weather modeling. HB VMs have 60 AMD EPYC 7551 processor cores, with 4 GB of RAM per CPU core and 240 GB of memory overall. HB-series VMs provide more than 260 GB/sec of memory bandwidth. This bandwidth is 33 percent faster than x86 alternatives and 2.5 times faster than is standard for most current HPC customers.

HC-series VMs

HC-series VMs are optimized for applications driven by dense computation, such as implicit finite element analysis, reservoir simulation, and computational chemistry. HC VMs have 44 Intel Xeon Platinum 8168 processor cores, with 8 GB of RAM per CPU core and 352 GB of memory overall. HC-series VMs support Intel software tools such as the Intel Math Kernel Library, and feature an all-cores clock speed greater than 3 GHz for most workloads.

Remote Direct Memory Access

The H16r and H16mr SKUs of the H-series, and both the HB- and HC-series VMs, use a second, low-latency, high-throughput network interface. It's called Remote Direct Memory Access (RDMA). RDMA enables direct memory access between systems without the involvement of the operating system. On Azure, network connections over an InfiniBand network enable this high-speed access.

Message Passing Interface (MPI) is a protocol for communication between computers as they run complex HPC tasks in parallel. To use it, your developers must use an implementation of the protocol, which is usually a library of routines in a .dll. RDMA can give a significant boost to the performance of MPI applications.


InfiniBand interconnects

InfiniBand is a data interconnect hardware standard for HPC. It's often used to accelerate communications between components, both within a single server and between servers. It has been designed to support the highest speeds and the lowest latency for messages between CPUs, and between processors and storage components.

Both HC- and HB-series VMs use a 100 GB/sec Mellanox EDR InfiniBand interconnect in a non-blocking tree configuration to boost hardware performance.


N-series VMs

Some HPC tasks are both compute-intensive and graphics-intensive. Suppose, for example, you're modeling the behavior of a wing in a wind tunnel, and you want to show a live visualization to help engineers understand that behavior. For these applications, consider using N-service VMs, which include single or multiple NVIDIA GPUs.

NC-series VMs

NC-series VMs use the NVIDIA Tesla K80 GPU card and Intel Xeon E5-2690 v3 processors. This series is the lowest cost of the N-series tiers, but VMs in this tier are capable of graphics-intensive applications. They also support NVIDIA's CUDA platform, so that you can use the GPUs to run compute instructions.

ND-series VMs

ND-series VMs are optimized for AI and deep learning workloads. They use the NVIDIA Tesla P40 GPU card and Intel Xeon E5-2690 v4 processors. They are fast at running single-precision floating point operations, which are used by AI frameworks including Microsoft Cognitive Toolkit, TensorFlow, and Caffe.


Microsoft HPC Pack

If you need more flexible control of your high-performance infrastructure, or you want to manage both cloud and on-premises VMs, consider using the Microsoft HPC Pack.

In your engineering company, you want to migrate high-performance infrastructure from on-premises datacenters into Azure. Because these systems are business critical, you want to migrate gradually. You need to ensure that you can rapidly respond to demand and manage VMs flexibly during the migration, when there will be both on-premises and cloud VMs.

Here, you will learn how the HPC Pack can manage HPC infrastructure.


What is HPC Pack?

In researching options for the engineering organization, you've looked at Azure Batch and Azure HPC Instances. But what if you want to have full control of the management and scheduling of your clusters of VMs? What if you have significant investment in on-premises infrastructure in your datacenter? HPC Pack offers a series of installers for Windows that allows you to configure your own control and management plane, and highly flexible deployments of on-premises and cloud nodes. By contrast with the exclusively cloud-based Batch, HPC Pack has the flexibility to deploy to on-premises and the cloud. It uses a hybrid of both to expand to the cloud when your on-premises reserves are insufficient.

Diagram of HPC Pack hybrid

Think of Microsoft HPC Pack as a version of the Batch management and scheduling control layer, over which you have full control, and for which you have responsibility. Deployment of HPC Pack requires Windows Server 2012 or later, and takes careful consideration to implement.


Plan for HPC Pack

Typically, you should prepare for the installation of HPC Pack with a full review of requirements. You need SQL Server and an Active Directory controller. You must also plan a topology. How many head or control nodes should there be, and how many worker nodes? Do you need to expand up to Azure? If so, you pre-provision Azure nodes as part of the cluster. The size of the main machines that make up the control plane (head and control nodes, SQL Server, and Active Directory domain controller) will depend on the projected cluster size.

When you install HPC Pack, it shows a job scheduler with support for both HPC and parallel jobs. The scheduler appears in the Microsoft Message Passing Interface. HPC Pack is highly integrated with Windows, so you can use Visual Studio for parallel debugging. You'll see all the application, networking, and operating system events from the compute nodes in the cluster in a single, debugger view.

HPC Pack also offers an advanced job scheduler. You can rapidly deploy, even to nodes not exclusively dedicated to HPC Pack, to Linux-based nodes, and Azure nodes. That means you can use spare capacity within your datacenter. HPC Pack provides an ideal way to use existing infrastructure investments, and keep more discrete control over how work gets divided up than is possible with Batch.


What is Azure Batch?

Few organizations have the resources to support permanent deployments of super-powerful compute platforms, which may only occasionally be used to capacity. More typically, you need a flexible and scalable compute solution, such as Azure Batch, to provide the computational power.

https://docs.microsoft.com/en-gb/learn/modules/run-parallel-tasks-in-azure-batch-with-the-azure-cli/4-exercise-create-azure-batch-job-in-cli-to-run-parallel-task

Comments

Popular posts from this blog

Develop Azure Compute Solutions (25-30%)