9.5 C
New York
Tuesday, March 21, 2023

Amazon EMR on EKS will get as much as 19% efficiency increase working on AWS Graviton3 Processors vs. Graviton2

Amazon EMR on EKS is a deployment choice that lets you run Spark workloads on Amazon Elastic Kubernetes Service (Amazon EKS) simply. It means that you can innovate quicker with the newest Apache Spark on Kubernetes structure whereas benefiting from the performance-optimized Spark runtime powered by Amazon EMR. This deployment choice elects Amazon EKS as its underlying compute to orchestrate containerized Spark purposes with higher value efficiency.

AWS frequently innovates to supply selection and higher price-performance for our clients, and the third-generation Graviton processor is the subsequent step within the journey. Amazon EMR on EKS now helps Amazon Elastic Compute Cloud (Amazon EC2) C7g—the newest AWS Graviton3 occasion household. On a single EKS cluster, we measured EMR runtime for Apache Spark efficiency by evaluating C7g with C6g households throughout chosen occasion sizes of 4XL, 8XL and 12XL. We’re excited to look at a most 19% efficiency achieve over the sixth technology C6g Graviton2 situations, which results in a 15% price discount.

On this put up, we focus on the efficiency check outcomes that we noticed whereas working the identical EMR Spark runtime on completely different Graviton-based EC2 occasion sorts.

For some use instances, such because the benchmark check, working a knowledge pipeline that requires a mixture of CPU sorts for the granular-level price effectivity, or migrating an current software from Intel to Graviton-based situations, we normally spin up completely different clusters that host separate kinds of processors, resembling x86_64 vs. arm64. Nonetheless, Amazon EMR on EKS has made it simpler. On this put up, we additionally present steerage on working Spark with a number of CPU architectures in a standard EKS cluster, in order that we are able to save vital effort and time on organising a separate cluster to isolate the workloads.

Infrastructure innovation

AWS Graviton3 is the newest technology of AWS-designed Arm-based processors, and C7g is the primary Graviton3 occasion in AWS. The C household is designed for compute-intensive workloads, together with batch processing, distributed analytics, information transformations, log evaluation, and extra. Moreover, C7g situations are the primary within the cloud to function DDR5 reminiscence, which supplies 50% larger reminiscence bandwidth in comparison with DDR4 reminiscence, to allow high-speed entry to information in reminiscence. All these improvements are well-suited for large information workloads, particularly the in-memory processing framework Apache Spark.

The next desk summarizes the technical specs for the examined occasion sorts:

Occasion Identify vCPUs Reminiscence (GiB) EBS-Optimized Bandwidth (Gbps) Community Bandwidth (Gbps) On-Demand Hourly Charge
c6g.4xlarge 16 32 4.75 As much as 10 $0.544
c7g.4xlarge 16 32 As much as 10 As much as 15 $0.58
c6g.8xlarge 32 64 9 12 $1.088
c7g.8xlarge 32 64 10 15 $1.16
c6g.12xlarge 48 96 13.5 20 $1.632
c7g.12xlarge 48 96 15 22.5 $1.74

These situations are all constructed on AWS Nitro System, a set of AWS-designed {hardware} and software program improvements. The Nitro System offloads the CPU virtualization, storage, and networking features to devoted {hardware} and software program, delivering efficiency that’s almost indistinguishable from naked metallic. Particularly, C7g situations have included assist for Elastic Material Adapter (EFA), which turns into the usual on this occasion household. It permits our purposes to speak immediately with community interface playing cards offering decrease and extra constant latency. Moreover, these are all Amazon EBS-optimized situations, and C7g supplies larger devoted bandwidth for EBS volumes, which may end up in higher I/O efficiency contributing to faster learn/write operations in Spark.

Efficiency check outcomes

To quantify efficiency, we ran TPC-DS benchmark queries for Spark with a 3TB scale. These queries are derived from TPC-DS commonplace SQL scripts, and the check outcomes will not be corresponding to different printed TPC-DS benchmark outcomes. Aside from the benchmark requirements, a single Amazon EMR 6.6 Spark runtime (suitable with Apache Spark model 3.2.0) was used as the information processing engine throughout six completely different managed node teams on an EKS cluster: C6g_4, C7g_4,C6g_8, C7g_8, C6g_12, C7g_12. These teams are named after occasion sort to tell apart the underlying compute assets. Every group can robotically scale between 1 and 30 nodes inside its corresponding occasion sort. Architecting the EKS cluster in such a method, we are able to run and evaluate our experiments in parallel, every of which is hosted in a single node group, i.e., an remoted compute setting on a standard EKS cluster. It additionally makes it attainable to run an software with a number of CPU architectures on the only cluster. Take a look at the pattern EKS cluster configuration and benchmark job examples for extra particulars.

We measure the Graviton efficiency and price enhancements utilizing two calculations: whole question runtime and geometric imply of the overall runtime. The next desk reveals the outcomes for equal sized C6g and C7g situations and the identical Spark configurations.

Benchmark Attributes 12 XL 8 XL 4 XL
Activity parallelism (spark.executor.core*spark.executor.situations) 188 cores (4*47) 188 cores (4*47) 188 cores (4*47)
spark.executor.reminiscence 6 GB 6 GB 6 GB
Variety of EC2 situations 5 7 16
EBS quantity 4 * 128 GB io1 disk 4 * 128 GB io1 disk 4 * 128 GB io1 disk
Provisioned IOPS per quantity 6400 6400 6400
Whole question runtime on C6g (sec) 2099 2098 2042
Whole question runtime on C7g (sec) 1728 1738 1660
Whole run time enchancment with C7g 18% 17% 19%
Geometric imply question time on C6g (sec) 9.74 9.88 9.77
Geometric imply question time on C7g (sec) 8.40 8.32 8.08
Geometric imply enchancment with C7g 13.8% 15.8% 17.3%
EMR on EKS reminiscence utilization price on C6g (per run) $0.28 $0.28 $0.28
EMR on EKS vCPU utilization price on C6g (per run) $1.26 $1.25 $1.24
Whole price per benchmark run on C6g (EC2 + EKS cluster + EMR value) $6.36 $6.02 $6.52
EMR on EKS reminiscence utilization price on C7g (per run) $0.23 $0.23 $0.22
EMR on EKS vCPU utilization price on C7g (per run) $1.04 $1.03 $0.99
Whole price per benchmark run on C7g (EC2 + EKS cluster + EMR value) $5.49 $5.23 $5.54
Estimated price discount with C7g 13.7% 13.2% 15%

The whole variety of cores and reminiscence are equivalent throughout all benchmarked situations, and 4 provisioned IOPS SSD disks had been connected to every EBS-optimized occasion for the optimum disk I/O efficiency. To permit for comparability, these configurations had been deliberately chosen to match with settings in different EMR on EKS benchmarks. Take a look at the earlier benchmark weblog put up Amazon EMR on Amazon EKS supplies as much as 61% decrease prices and as much as 68% efficiency enchancment for Spark workloads for C5 situations primarily based on x86_64 Intel CPU.

The desk signifies C7g situations have constant efficiency enchancment in comparison with equal C6g Graviton2 situations. Our check outcomes confirmed 17–19% enchancment in whole question runtime for chosen occasion sizes, and 13.8–17.3% enchancment in geometric imply. On price, we noticed 13.2–15% price discount on C7g efficiency checks in comparison with C6g whereas working the 104 TPC-DS benchmark queries.

Knowledge shuffle in a Spark workload

Usually, large information frameworks schedule computation duties for various nodes in parallel to realize optimum efficiency. To proceed with its computation, a node should have the outcomes of computations from upstream. This requires transferring intermediate information from a number of servers to the nodes the place information is required, which is termed as shuffling information. In lots of Spark workloads, information shuffle is an inevitable operation, so it performs an necessary function in efficiency assessments. This operation might contain a excessive price of disk I/O, community information transmission, and will burn a major quantity of CPU cycles.

In case your workload is I/O sure or bottlenecked by present information shuffle efficiency, one suggestion is to benchmark on improved {hardware}. Total, C7g presents higher EBS and community bandwidth in comparison with equal C6g occasion sorts, which can enable you to optimize efficiency. Due to this fact, in the identical benchmark check, we captured the next additional info, which is damaged down into per-instance-type community/IO enhancements.

Based mostly on the TPC-DS question check outcome, this graph illustrates the proportion will increase of information shuffle operations in 4 classes: most disk learn and write, and most community obtained and transmitted. Compared to c6g situations, the disk learn efficiency improved between 25–45%, whereas the disk write efficiency enhance was 34–47%. On the community throughput comparability, we noticed a rise of 21–36%.

Run an Amazon EMR on EKS job with a number of CPU architectures

When you’re evaluating migrating to Graviton situations for Amazon EMR on EKS workloads, we suggest testing the Spark workloads primarily based in your real-world use instances. If you must run workloads throughout a number of processor architectures, for instance check the efficiency for Intel and Arm CPUs, comply with the walkthrough on this part to get began with some concrete concepts.

Construct a single multi-arch Docker picture

To construct a single multi-arch Docker picture (x86_64 and arm64), full the next steps:

  1. Get the Docker Buildx CLI extension.Docker Buildx is a CLI plugin that extends the Docker command to assist the multi-architecture function. Improve to the newest Docker desktop or manually obtain the CLI binary. For extra particulars, try Working with Buildx.
  2. Validate the model after the set up:
  3. Create a brand new builder that provides entry to the brand new multi-architecture options (you solely should carry out this activity as soon as):
    docker buildx create --name mybuilder --use

  4. Log in to your individual Amazon ECR registry:
    ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output textual content)
    aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $ECR_URL

  5. Get the EMR Spark base picture from AWS:
    docker pull $SRC_ECR_URL/spark/emr-6.6.0:newest
    aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $SRC_ECR_URL

  6. Construct and push a customized Docker picture.

On this case, we construct a single Spark benchmark utility docker picture on high of Amazon EMR 6.6. It helps each Intel and Arm processor architectures:

  • linux/amd64 – x86_64 (often known as AMD64 or Intel 64)
  • linux/arm64 – Arm
docker buildx construct 
--platform linux/amd64,linux/arm64 
-t $ECR_URL/eks-spark-benchmark:emr6.6 
-f docker/benchmark-util/Dockerfile 
--build-arg SPARK_BASE_IMAGE=$SRC_ECR_URL/spark/emr-6.6.0:newest 
--push .

Submit Amazon EMR on EKS jobs with and with out Graviton

For our first instance, we submit a benchmark job to the Graviton3 node group that spins up c7g.4xlarge situations.

The next will not be a whole script. Take a look at the full model of the instance on GitHub.

aws emr-containers start-job-run 
--virtual-cluster-id $VIRTUAL_CLUSTER_ID 
--name emr66-c7-4xl 
--execution-role-arn $EMR_ROLE_ARN 
--release-label emr-6.6.0-latest 
--job-driver '{
    "sparkSubmitJobDriver": {
    "entryPoint": "native:///usr/lib/spark/examples/jars/eks-spark-benchmark-assembly-1.0.jar",
    "sparkSubmitParameters": "........"}}' 
--configuration-overrides '{
"applicationConfiguration": [{
    "classification": "spark-defaults",
    "properties": {
        "spark.kubernetes.container.image": "'$ECR_URL'/eks-spark-benchmark:emr6.6",
        "spark.kubernetes.node.selector.eks.amazonaws.com/nodegroup": “C7g_4”

Within the following instance, we run the identical job on non-Graviton C5 situations with Intel 64 CPU. The full model of the script is accessible on GitHub.

aws emr-containers start-job-run 
--virtual-cluster-id $VIRTUAL_CLUSTER_ID 
--name emr66-c5-4xl 
--execution-role-arn $EMR_ROLE_ARN 
--release-label emr-6.6.0-latest 
--job-driver '{
    "sparkSubmitJobDriver": {
    "entryPoint": "native:///usr/lib/spark/examples/jars/eks-spark-benchmark-assembly-1.0.jar",
    "sparkSubmitParameters": "........"}}'     
--configuration-overrides '{
"applicationConfiguration": [{
    "classification": "spark-defaults",
    "properties": {
        "spark.kubernetes.container.image": "'$ECR_URL'/eks-spark-benchmark:emr6.6",
        "spark.kubernetes.node.selector.eks.amazonaws.com/nodegroup”: “C5_4”


In Might 2022, the Graviton3 occasion household was made obtainable to Amazon EMR on EKS. After working the performance-optimized EMR Spark runtime on the chosen newest Arm-based Graviton3 situations, we noticed as much as 19% efficiency enhance and as much as 15% price financial savings in comparison with C6g Graviton2 situations. As a result of Amazon EMR on EKS presents 100% API compatibility with open-source Apache Spark, you possibly can shortly step into the analysis course of with no software modifications.

When you’re questioning how a lot efficiency achieve you possibly can obtain along with your use case, check out the benchmark resolution or the EMR on EKS Workshop. You can even contact your AWS Options Architects, who may be of help alongside your innovation journey.

In regards to the writer

Melody Yang is a Senior Huge Knowledge Resolution Architect for Amazon EMR at AWS. She is an skilled analytics chief working with AWS clients to supply greatest observe steerage and technical recommendation with a view to help their success in information transformation. Her areas of pursuits are open-source frameworks and automation, information engineering and DataOps.

Related Articles


Please enter your comment!
Please enter your name here

Latest Articles