Wednesday, April 2, 2025

Utilize the Batch Processing Gateway to streamline job management across multiple clusters within your Amazon EMR on EKS infrastructure.

Amazon Web Services (AWS) customers typically process vast amounts of data, often in the range of petabytes. In complex enterprise environments where multiple workloads and diverse operational demands are prevalent, organizations often opt for a multi-cluster configuration due to the advantages it provides.

  • – In the event of a single cluster failure, multiple clusters remain capable of processing critical workloads, ensuring seamless business continuity.
  • Elevated job isolation boosts safety by reducing cross-contamination risks and streamlines regulatory compliance.
  • Distributing workload across clusters enables seamless scalability in response to fluctuating demands.
  • Kubernetes scheduling latency and community network contention significantly reduce to expedite job execution times.
  • You can enjoy straightforward experimentation and cost optimization through workload segmentation into multiple clusters.

Despite the benefits of a multi-cluster setup, one significant drawback is the lack of an intuitive method for distributing workloads and ensuring effective load balancing across multiple clusters, thereby hindering the overall efficiency of the system.

This proposal presents a solution to the problem by introducing a centralized gateway that automates job management and routing in multi-cluster environments, thereby simplifying and streamlining workflows.

Challenges with multi-cluster environments

In a multi-cluster environment, Spark jobs running on Amazon EMR on EKS require submission to distinct clusters from various users. The revised text reads: This structure presents several significant hurdles.

  • Shopper preferences dictate replacement of connection sets for each objective grouping.
  • Managing individual consumer connections in isolation amplifies complexity and operational strain.
  • There is no inherent capability to direct jobs across multiple clusters, hindering the setup, resource distribution, return on investment visibility, and fault tolerance.
  • Without load balancing, the system lacks fault tolerance and suffers from reduced availability.

To overcome these hurdles, BPG tackles the complexities head-on by providing a unified platform for submitting Spark jobs at a single level. BPG streamlines job routing to the most suitable Elastic Managed Resource (EMR) on EKS clusters, ensuring efficient load balancing, effortless endpoint management, and enhanced reliability for scalable and resilient operations. For customers with complex Amazon EMR on EKS configurations involving multiple clusters and various dependencies, this guidance proves particularly valuable.

Notwithstanding its significant benefits, the current design of BPG is limited to functioning exclusively with the Spark Kubernetes Operator. Moreover, the applicability of BPG remains unexplored when applied to, and the relevance of its answers is uncertain in environments that utilize.

Resolution overview

Designs an abstraction that packages access to an external system or valuable resource. The valuable resource for handling EMR on EKS clusters utilizing Spark functionality lies within. A gateway serves as a unified entry point to access and utilize this valuable resource. The interaction between any code or connection occurs exclusively through the gateway’s interface. The gateway seamlessly translates the incoming API request into the API format supplied by the relevant resource.

The BPG is a specifically designed gateway that provides a seamless interface to Spark on Kubernetes environments. We summarize key particulars about customers’ underlying Spark configurations on their EKS clusters. The application executes within its dedicated Amazon Elastic Container Service for Kubernetes (EKS) environment, interacting with the Kubernetes Application Programming Interface (API) servers from multiple distinct EKS clusters. Customers submitting software applications to Spark through end-users have those submissions routed by BPG to one of its underlying Amazon Elastic Container Service for Kubernetes (EKS) clusters.

The process for submitting Apache Spark applications using BPG (Batched Pipeline API) for Amazon EMR on Amazon Elastic Kubernetes Service (EKS) involves the following steps:

  1. When a consumer submits a job to BPG using a consumer-facing interface.
  2. The BPG parses requests correctly, converting them into customised Resource Definitions (CRDs) that are then submitted to an Amazon Managed Service for Kubernetes (EMR) on Elastic Kubernetes Service (EKS) clusters according to predetermined rules.
  3. The Spark Kubernetes Operator effectively interprets job specifications and triggers job execution within a cluster.
  4. The Kubernetes scheduler orchestrates the execution of pods by assigning them suitable nodes for deployment.

The following determines the key characteristics of BPG. You can learn more about BPG on GitHub.

Image showing the high-level details of Batch Processing Gateway

A solution to the limitations identified involves deploying Best Practices for Production (BPG) across several existing Elastic Managed Resources (EMR) on Amazon Kubernetes Service (EKS) clusters, effectively addressing these issues. The diagram that follows summarizes the key takeaways from our discussion.

Image showing the end to end architecture of of Batch Processing Gateway

Supply Code

You’ll find the codebase located within the GitHub repository, accessible at

As we navigate through the process, we outline the essential steps to successfully execute the solution.

Stipulations

Before deploying this solution, confirm that all prerequisites are satisfied.

Clone the repository locally onto your personal computing device.

We assume that each repository is cloned into the house listing directory.~/). The revised text is: All provided relative paths are fundamentally grounded in this assumption. Once you’ve cloned the repositories to a designated location, ensure that the pathways are properly adjusted.

  1. Clone the Best Practices for Production (BPG) on Elastic Container Service for Kubernetes (ECS) and Amazon Elastic Container Registry (ECR) on EKS GitHub repository with the following command:
cd ~/ git clone git@github.com:aws-samples/batch-processing-gateway-on-emr-on-eks.git

The BPG repository is currently undergoing lively improvements. We’ve fixed the repository reference to a specific, stable commit hash, ensuring consistent deployment execution as per the guidelines. aa3e5c8be973bee54ac700ada963667e5913c865.

Before cloning a repository, ensure you are up-to-date on all security patches and adhere to your team’s established safety protocols.

  1. Clone the British Photography Guidelines (BPG) GitHub repository using the following command:

    `git clone https://github.com/britishphotographyguidelines/British-Photography-Guidelines.git`

Cloning the Apple's Batch Processing Gateway repository and checking out a specific commit: `git clone git@github.com:apple/batch-processing-gateway.git && cd batch-processing-gateway && git checkout aa3e5c8be973bee54ac700ada963667e5913c865`

“`
kubectl apply -f https://raw.githubusercontent.com/aws/emr-containers/master/examples/emr-on-eks-cluster.yaml
kubectl apply -f https://raw.githubusercontent.com/aws/emr-containers/master/examples/emr-on-eks-cluster2.yaml
“`

The initial emphasis of this setup will not be the development of EMR on EKS clusters. Please follow these steps to ensure a successful installation. Here is the rewritten text:

To enhance your experience, we’ve added instructions for setting up an EMR on EKS digital clusters, now titled spark-cluster-a-v and spark-cluster-b-v within the . To establish effective clusters, adhere to this step-by-step process:

After completing the efficient processing of these steps, it is advisable to configure and utilize two separate EMR on EKS digital clusters named. spark-cluster-a-v and spark-cluster-b-v operating on the EKS clusters spark-cluster-a and spark-cluster-b, respectively.

To verify the successful formation of clusters in Amazon EMR, navigate to the console, then click on under the navigation menu.

Image showing the Amazon EMR on EKS setup

Arrange BPG on Amazon EKS

To deploy Bitnami PostgreSQL (BPG) on Amazon Elastic Container Service for Kubernetes (EKS), follow these steps:

  1. Change to the suitable listing:
cd ~/batch-processing-gateway-on-emr-on-eks/bpg/
  1. Arrange the AWS Area:
export AWS_REGION="<>"
  1. . Ensure compliance with your team’s most stringent best practices for secure key pair management.
aws ec2 create-key-pair  --region ""  --key-name ekskp  --key-type ed25519  --key-format pem  --query "KeyMaterial"  --output textual content > ekskp.pem chmod 400 ekskp.pem ssh-keygen -y -f ekskp.pem > eks_publickey.pem chmod 400 eks_publickey.pem

You’re now able to create an EKS cluster.

By default, eksctl Establishes a highly available and secure Amazon Elastic Container Service for Kubernetes (EKS) cluster within dedicated DigitalOcean virtual private clouds (VPCs), leveraging the scalability and flexibility of cloud-native infrastructure. To avoid hitting the default soft limit on the number of VPCs within an account, we employ the --vpc-public-subnets Parameter to create clusters within a current VPC? We deploy the solution using the default virtual private cloud (VPC) by default. Deploy CloudFormation template to update the stack’s resources, ensuring all instances and RDS databases are launched within a specific VPC and subnet group that aligns with our team’s best practices for security and compliance. Please confirm official guidance.

  1. Obtain the general public subnets associated with your Virtual Private Cloud (VPC).
export DEFAULT_FOR_AZ_SUBNET=$(aws ec2 describe-subnets --region ${REGION} --filters Name=availability-zone,Values=* --query="Subnets[]|select(.AvailabilityZone!=`us-east-1e`).SubnetId"|jq -r .) map(tostring) -> join(",", toString())
  1. Create the cluster:
eksctl create cluster --name bpg-cluster --region ${YOUR_REGION} --vpc-public-subnets ${YOUR_VPC_PUBLIC_SUBNETS} --with-oidc --ssh-access --ssh-public-key path/to/eks_publickey.pem --instance-types=m5.xlarge --managed
  1. Within the Amazon EKS console’s navigation pane, select the Resources tab to examine the profitable provisioning of your cluster. bpg-cluster

Image showing the Amazon EKS based BPG cluster setup

Within subsequent steps, we make the necessary modifications to the existing codebase.

To enhance your comfort, we have recently provided access to the latest relevant details within batch-processing-gateway-on-emr-on-eks repository. You possibly can copy these informations into the appropriate folders and then reorganize them for easier access. batch-processing-gateway repository.

  1. Change POM xml file:
cp ~/batch-processing-gateway-on-emr-on-eks/bpg/pom.xml ~/batch-processing-gateway/pom.xml
  1. Change DAO java file:
cp ~/batch-processing-gateway-on-emr-on-eks/bpg/LogDao.java ~/batch-processing-gateway/src/important/java/com/apple/spark/core/LogDao.java
  1. Change the Dockerfile:
cp ~/batch-processing-gateway-on-emr-on-eks/bpg/Dockerfile ~/batch-processing-gateway/Dockerfile

Now you’re ready to build your Docker image.

  1. Create an Amazon Elastic Container Registry (ECR) repository that is not publicly accessible by following these steps:

    1. Log in to the AWS Management Console and navigate to the Amazon ECR dashboard.
    2. Click “Create repository” and enter a name and description for your new repository, then select the “Create” button.
    3. In the “Repository settings” section, under “Visibility”, select “Private” to make the repository not publicly accessible.
    4. Choose an IAM role or select “Create an IAM role” to manage access to your ECR registry.

AWS ECR create-repository --repository-name bpg --region us-west-2
  1. Get the AWS account ID:
echo $(aws sts get-caller-identity --query Account --output text) | tr -d '\n' | sed 's/ //g' | awk '{print $1}'
  1. Authenticate with your Amazon Elastic Container Registry (ECR):
aws ecr get-login-password --region  | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com
  1. Construct your Docker picture:
cd ~/batch-processing-gateway/ && docker build --platform=linux/amd64 -t bpg:1.0.0 --build-arg VERSION="1.0.0" --build-arg BUILD_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ") --build-arg GIT_COMMIT=$(git rev-parse HEAD) --progress=plain --no-cache .
  1. Tag your picture:
docker tag bpg:1.0.0 "".dkr.ecr."".amazonaws.com/bpg:1.0.0
  1. Upload the image to your designated Enterprise Container Registry (ECR).
docker push "".dkr.ecr."".amazonaws.com/bpg:1.0.0

The ImagePullPolicy Within the Batch-Processing-Gateway GitHub repository is a collection IfNotPresent. Replace the picture tag in case you should replace the picture.

  1. To verify the successful creation and push of a Docker image to Amazon ECR, access the Amazon ECR console, navigate to Repositories in the sidebar, and locate the relevant registry. bpg repository:

Image showing the Amazon ECR setup

To arrange an Amazon Aurora MySQL database:

Create a new cluster by going to the Amazon RDS dashboard and clicking on “Launch DB instance”. Select the “MySQL” engine and then choose “Amazon Aurora MySQL” as your preferred version. Choose the desired instance type, storage size, and VPC.

Design a logical schema for the database by identifying entities and their relationships; determine the primary keys, foreign keys, and other constraints required for data integrity.

  1. The Amazon Web Services (AWS) SDK provides a method to retrieve the list of default subnets for a specific Availability Zone. The following code snippet demonstrates how to achieve this:
    “`
    from awscli.customizations.autocomplete import DEFAULT_SUBNETS_FORMAT
    awscli.configure()

    def get_default_subnets(az):
    ec2 = boto3.client(‘ec2’)
    response = ec2.describe_availability_zones(Filters=[{‘Name’: ‘zone’, ‘Values’: [az]}])
    default_subnets = []
    for subnet in response[‘AvailabilityZones’][0][‘SubnetIds’]:
    default_subnets.append(f”subnet-{az}-{subnet.split(‘-‘)[1]}”)
    return ‘\n’.join(sorted([f”{DEFAULT_SUBNETS_FORMAT}{subnet}” for subnet in default_subnets]))

    print(get_default_subnets(‘us-west-2’))

aws ec2 describe-subnets --region "${REGION}" --filters Name=default-for-az,Values=true --query 'Subnets[]| .SubnetId' -rjq '.'
  1. Create a subnet group. Can you kindly specify which details you would like to have, as we require further information to proceed?
aws rds create-db-subnet-group --db-subnet-group-name "bpg-rds-subnetgroup" --db-subnet-group-description '"BPG Subnet Group for RDS"' --subnet-ids '["", ""]' --region ''
  1. Checklist the default VPC:
The default VPC ID is $(aws ec2 describe-vpcs --region ${AWS_REGION} --filters Name=isDefault,Values=true --query 'Vpcs[].VpcId' --output text | awk '{print $2}')
  1. Create a safety group:
AWS EC2 create-security-group --group-name BPG_RDS_SecurityGroup --description 'BPG Safety Group for RDS' --vpc-id '' --region ''
  1. Checklist the bpg-rds-securitygroup safety group ID:
The AWS CLI command to export the security group ID of a specific security group named `bpg-rds-securitygroup` is as follows: export BPG_RDS_SG=$(aws ec2 describe-security-groups --filters Name=group-name,Values=bpg-rds-securitygroup --query "SecurityGroups[0].GroupId" --output text)
  1. The following command creates an Aurora MySQL DB instance with a read replica and a writer instance in a regional cluster:

    aws db create-cluster –db-cluster-identifier my-aurora-cluster –engine aurora-mysql –database-instances writer=writer-instance,reader1=reader1-instance,reader2=reader2-instance –region us-west-2 –vpc-security-group-ids sg-12345678 –subnet-ids subnet-0123456789abcdef0,subnet-0987654321fedcba –master-username my-mysql-username –master-user-password my-mysql-password What specific details do you wish to explore further?

aws rds create-db-cluster --db-name bpg --db-cluster-identifier bpg --engine aurora-mysql --engine-version 8.0.mysql_aurora.3.06.1 --master-username admin --set-master-credentials --vpc-security-group-ids "" --subnet-group-name bpg-rds-subnetgroup --region ""
  1. Establishing a Data Base (DB) author instance within the existing cluster ensures seamless integration with the current infrastructure and fosters collaboration among team members. This strategic move enables data-driven decision making, facilitates efficient data analysis, and streamlines processes across various departments. By hosting the DB author instance within the same cluster, you can leverage the benefits of a centralized data management framework, ensuring consistency, accuracy, and scalability throughout your organization. What are the specific requirements surrounding this request? Are there any particular industries or areas of focus that need to be taken into account?
aws rds create-db-instance \   --db-instance-identifier bpg \   --db-cluster-identifier bpg \   --db-instance-class db.r5.giant \   --engine=aurora-mysql \   --region ${AWS_REGION}
  1. To verify the successful creation of an RDS Regional Cluster and authorise the instance, navigate to the Amazon RDS console, select “Regional clusters” from the left-hand menu, and review the details for the newly created instance. bpg database.

Image showing the RDS setup

Arrange community connectivity

In some cases, safety teams for EKS clusters are linked to the nodes and the managed plane (when using managed nodes), respectively. The network allows for safe configuration of the node’s safety group. bpg-cluster to speak with spark-cluster-a, spark-cluster-b, and the bpg Aurora RDS cluster.

  1. Establish the safety teams of bpg-cluster, spark-cluster-a, spark-cluster-b, and the bpg Aurora RDS cluster:
aws ec2 describe-instances --filters Name=tag:eks:cluster-name,Values=bpg-cluster --query "Reservations[0].Instances[0].SecurityGroups[?contains(GroupName, 'eks-cluster-sg-bpg-cluster-')].GroupId" --region "" --output text | uniq aws eks describe-cluster --name spark-cluster-a --query "cluster.resourcesVpcConfig.clusterSecurityGroupId" --output text aws eks describe-cluster --name spark-cluster-b --query "cluster.resourcesVcp
  1. Enabling the Node Safety Group ensures that nodes in your network are protected from unauthorized access and tampering. bpg-cluster to speak with spark-cluster-a, spark-cluster-b, and the bpg Aurora RDS cluster:
aws ec2 authorize-security-group-ingress --group-ids "","" --protocols tcp --ports 443,3306 --source-groups """"

Deploy BPG

We leverage BPG to make informed decisions on weight-based clustering. spark-cluster-a-v and spark-cluster-b-v The message processing workflow is designed to operate efficiently, with a designated queue named dev and weight=50. We rely on statistically equal job distribution across both clusters. Please provide the original text you’d like me to improve. I’ll revise it in a different style and return the revised text as my direct answer.

  1. Get the bpg-cluster context:
BPG_CLUSTER_CONTEXT=$(kubectl config view --output json | jq -r '.clusters[] | select(.name | contains("bpg-cluster")) | .cluster')
  1. The Kubernetes namespace for BPG will be named `bpg-dev`.
kubectl create namespace bpg

The Helm chart for Business Productivity Group (BPG) necessitates a values.yaml file. The file comprises a multitude of key-value pairings, meticulously documenting details for each EMR on EKS cluster, EKS cluster, and Aurora cluster instance. Manually updating the values.yaml file will be cumbersome. To streamline the process, we’ve successfully automated the generation of values.yaml file.

  1. Run the next script to generate the values.yaml file:
cd ~/batch-processing-gateway-on-emr-on-eks/bpg chmod 755 create-bpg-values-yaml.sh ./create-bpg-values-yaml.sh
  1. Use the command `helm install my-release my-chart –set image.repository=my-repo` to deploy the Helm chart. What does this new product guarantee? values.template.yaml and values.yaml matches the Docker image tag designated previously.
cp ~/batch-processing-gateway/helm/batch-processing-gateway/values.yaml ~/batch-processing-gateway/helm/batch-processing-gateway/values.yaml.$(date +'%YpercentmpercentdpercentHpercentMpercentS')  && cp ~/batch-processing-gateway-on-emr-on-eks/bpg/values.yaml ~/batch-processing-gateway/helm/batch-processing-gateway/values.yaml  && cd ~/batch-processing-gateway/helm/batch-processing-gateway/ kubectl config use-context "" helm set up batch-processing-gateway . --values values.yaml -n bpg
  1. Verify the successful deployment by inspecting the individual pods and examining their log outputs.
kubectl get pods --namespace bpg kubectl logs <> --namespace bpg
  1. Enter the BPG pod and confirm the wellbeing examination.
kubectl exec -it <> -n bpg -- bash  curl -u admin:admin localhost:8080/skatev2/healthcheck/standing

We get the next output:

{"standing":"OK"}

The BPG has been successfully and efficiently deployed onto the Amazon Elastic Kubernetes Service (EKS) cluster.

Check the answer

To efficiently verify the result, consider submitting multiple Spark jobs by iterating over this template code several times. The code submits the SparkPi

A Spark job is submitted to the Business Product Group (BPG), which subsequently submits the roles to the Amazon Elastic MapReduce (EMR) on an Amazon Elastic Kubernetes Service (EKS) cluster, primarily based on predetermined set parameters.

  1. kubectl config use-context bpg-cluster
kubectl config get-contexts | awk 'NR==1 || /bpg-cluster/' kubectl config use-context "<>"
  1. Establish the bpg pod title:
kubectl get pods --namespace bpg
  1. Exec into the bpg pod:

kubectl exec -it "<>" -n bpg -- bash

  1. The following command submits a Spark job using curl:

    curl -X POST \
    http://localhost:8080/spark/jobs \
    -H “Content-Type: application/json” \
    -d ‘{“class”:”org.apache.spark.examples.SparkPi”,”args”:[“2″],”mainClass”:”org.apache.spark.examples.SparkPi”}’ The original text is:

    Run the under curl command to submit jobs to

    Improved text in a different style as a professional editor:

    Submit jobs using the under curl command. spark-cluster-a and spark-cluster-b:

curl -u consumer:cross localhost:8080/skatev2/spark -i -X POST \ -H 'Content-Type: application/json' \ -d '{   "applicationName": "SparkPiDemo",   "sparkVersion": "3.5.0",   "mainApplicationFile": "native:///usr/lib/spark/examples/jars/spark-examples.jar",   "mainClass": "org.apache.spark.examples.SparkPi",   "driver": {     "cores": 1,     "memory": "2g",     "serviceAccount": "emr-containers-sa-spark",     "labels": {       "model": "3.5.0"     }   },   "executor": {     "cases": 1,     "cores": 1,     "memory": "2g",     "labels": {       "model": "3.5.0"     }   } }'

BPG will notify you about the cluster where your submissions were processed after each task is completed. For instance:

HTTP/1.1 200 OK Date: Sat, 10 Aug 2024 16:17:15 GMT Content-Type: application/json Content-Length: 267 [{"submissionId":"spark-cluster-a-f72a7ddcfde14f4390194d4027c1e1d6"},  {"submissionId":"spark-cluster-a-d1b359190c7646fa9d704122fbf8c580"},  {"submissionId":"spark-cluster-b-7b61d5d512bb4adeb1dd8a9977d605df"}]
  1. The roles are functioning as intended within the EMR cluster. All necessary nodes and instances are online and performing their designated duties without any issues or errors being reported. The job stream is executing smoothly, with each task and step completing successfully before moving on to the next one. In essence, the entire process is working harmoniously, ensuring that data processing occurs efficiently and accurately within the EMR environment. spark-cluster-a and spark-cluster-b:
kubectl config get-contexts | awk 'NR==1 || /spark-cluster-(a|b)/' kubectl get pods -n spark-operator --context "<>"

To view the Spark Driver logs and determine the value of Pi as calculated within, simply follow these steps…

kubectl logs <> --namespace spark-operator --context "<>"

Upon successful project completion, you should expect to find a corresponding log entry that reads:

Pi is roughly 3.1452757263786317

Now that we’ve thoroughly investigated weight-based routing for Spark jobs across multiple clusters.

Clear up

To thoroughly scrub your sources, follow these steps:

  1. To delete the EMR (Elastic MapReduce) on EKS (Amazon Elastic Container Service for Kubernetes) digital cluster:
    “`bash
    aws emr delete-cluster –cluster-id –region
    “`
    Note: You can replace `` with your actual cluster ID and `` with the AWS region where your EMR is located.
VIRTUAL_CLUSTER_ID=$(aws emr-containers list-virtual-clusters --region="" --query "virtualClusters[?name=='spark-cluster-a-v' && state=='RUNNING'].id" --output textual content) aws emr-containers delete-virtual-cluster --region="" --id "" VIRTUAL_CLUSTER_ID=$(aws emr-containers list-virtual-clusters --region="" --query "virtualClusters[?name=='spark-cluster-b-v' && state=='RUNNING'].id" --output textual content) aws emr-containers delete-virtual-cluster --region="" --id ""
  1. Delete the (IAM) function:
aws iam delete-role-policy --role-name sparkjobrole --policy-name EMR-Spark-Job-Execution && aws iam delete-role --role-name sparkjobrole
  1. RDS instances are deleted.
 aws rds delete-db-instance --db-instance-identifier 'bpg' --skip-final-snapshot aws rds delete-db-cluster --db-cluster-identifier 'bpg' --skip-final-snapshot
  1. Delete the bpg-rds-securitygroup safety group and bpg-rds-subnetgroup subnet group:
 BPG_SG=$(aws ec2 describe-security-groups --filters "Name=group-name,Values=bpg-rds-securitygroup" --query "SecurityGroups[0].GroupId" --output text) if [ -n "$BPG_SG" ]; then   aws ec2 delete-security-group --group-id $BPG_SG fi aws rds delete-db-subnet-group --db-subnet-group-name bpg-rds-subnetgroup
  1. Delete the EKS clusters:
eksctl delete cluster --region= --name=bpg-cluster eksctl delete cluster --region= --name=spark-cluster-a eksctl delete cluster --region= --name=spark-cluster-b
  1. Delete bpg ECR repository:
AWS ECR deletes the repository named 'bpg' in the specified region.
  1. Delete the important thing pairs:
aws ec2 delete-key-pair --key-name 'eksrp' aws ec2 delete-key-pair --key-name "emr kp"

Conclusion

This post delves into the complexities of workload management on EMR clusters hosted on Amazon Elastic Kubernetes Service (EKS), and showcases the advantages of employing a multi-cluster deployment approach. We introduced the Batch Processing Gateway (BPG), a pioneering solution that streamlines job management, bolsters reliability, and amplifies horizontal scaling capabilities across complex, multi-cluster settings. Through a successful implementation of BPG, we demonstrated a practical application of the gateway structure sample, enabling seamless submissions of Spark jobs on Amazon EMR running on Amazon Elastic Kubernetes Service (EKS). This comprehensive overview provides a thorough grasp of the matter, elucidates the benefits of the gateway framework, and outlines the crucial steps to effectively execute Business Process Governance (BPG).

We invite you to assess the effectiveness of your existing Spark on Amazon EMR on EKS deployment, taking into consideration the insights provided in this response. The platform enables seamless management of Spark applications on Kubernetes through a user-friendly API, alleviating users from worrying about intricate technical details.

For this put-up, we focused on the implementation details of the BPG. You could also explore integrating BPG with customers like Amazon MWAA, or other similar platforms. The BPG (Background Processing Group) operates seamlessly with the scheduler, ensuring efficient resource allocation and timely job execution. You may also discover the benefits of integrating BPG to leverage Yunikon queues for efficient job submission.


Concerning the Authors

Image of Author: Umair Nawaz Is a senior DevOps architect at Amazon Web Services. He specializes in designing secure architectures and consults with companies on implementing efficient software delivery methodologies. He is driven to address problems in a thoughtful manner through the effective application of cutting-edge technologies.

Image of Author: Ravikiran Rao As a Knowledge Architect at Amazon Web Services, she is passionately driven to resolve intricate knowledge puzzles for multiple clients. Outside of his labor, he’s an ardent theatre enthusiast and fledgling tennis player.

Image of Author: Sri Potluri Serves as a Cloud Infrastructure Architect at Amazon Web Services. He’s enthralled by tackling complex problems and presenting clear solutions to a diverse range of clients. With expertise spanning multiple cloud disciplines, he provides customized and reliable infrastructure solutions tailored to the unique needs of each project.

Image of Author: Suvojit Dasgupta Is a principal knowledge architect at Amazon Web Services. He spearheads a team of accomplished engineers, crafting large-scale knowledge solutions tailored to meet the needs of AWS clients. A specialist in cultivating and deploying forward-thinking knowledge frameworks to address intricate corporate issues.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles