Amazon Managed Workflows for Apache Airflow (Amazon MWAA) offers a safe and managed setting to run Apache Airflow on AWS. Airflow is usually utilized in extremely regulated industries, similar to finance and healthcare. These clients may need to additional prohibit entry and visitors to reinforce safety posture than what the Amazon MWAA default configurations present. This submit covers some beneficial practices.
The precept of least privilege is a elementary tenet that ought to be adopted diligently. In the case of configuring AWS companies, it’s important to grant solely the minimal required permissions to sources, avoiding overly broad or permissive insurance policies.
On this submit, we discover how one can apply the precept of least privilege to your Amazon MWAA setting by tightening community safety utilizing safety teams, community entry management lists (ACLs), and digital non-public cloud (VPC) endpoints. We additionally talk about the Amazon MWAA execution and deployment roles and their respective permissions.
Understanding the Amazon MWAA setting
When an Amazon MWAA setting is created, sources are created in an AWS managed service VPC and your buyer managed VPC. Within the buyer VPC supplied at setting creation, the mandatory sources to run the Airflow setting are deployed, together with schedulers and staff working on Amazon Elastic Container Service (Amazon ECS) clusters. These clusters are deployed in your VPC and so they assume Elastic Community Interfaces (ENIs) with non-public IP addresses within the buyer account. These ENIs span non-public subnets throughout two Availability Zones to hook up with the Airflow database and internet server, which reside within the service-owned account (if in non-public entry mode). The next diagram illustrates this structure.
VPC safety teams act as digital firewalls that may management community visitors on the ENI stage, or occasion stage. Safety teams are stateful, that means that inbound visitors is robotically permitted outbound and vice versa. The default safety group configuration in a VPC begins with isn’t any inbound guidelines and an outbound rule permitting all visitors. By definition, a safety group with no inbound guidelines denies all ingress visitors that wasn’t allowed out via the 0.0.0.0/0 outbound rule.
Amazon MWAA presents two internet server entry modes contained in the buyer VPC: private and non-private. Public internet server mode will need to have a manner for visitors to entry the net servers within the customer-owned VPC via the general public web. This requires routing to the general public web utilizing public subnets and a NAT gateway. A NAT gateway can be utilized to supply web entry for sources in non-public subnets. With non-public entry mode, the safety group for the Amazon MWAA setting doesn’t want to permit visitors to and from the NAT gateway, solely granting entry to the Airflow UI to customers with acceptable permissions from throughout the VPC. An Software Load Balancer is simply provisioned in public mode to route visitors to the general public internet servers. The shopper should provision the remainder of the networking elements.
In case your Amazon MWAA setting wants to speak with sources exterior your VPC (similar to exterior information sources or APIs), you may have to configure acceptable safety group guidelines and routing to permit the mandatory visitors. In such circumstances, you’ll sometimes use a NAT gateway or VPN connection to facilitate the communication between your Amazon MWAA setting and the exterior sources and VPC endpoints for AWS sources.
For tighter safety restrictions, an setting with non-public routing with out web entry is feasible, and finer-grained safety group guidelines could be utilized and VPC endpoint insurance policies can be utilized. As a result of this submit is specializing in least privilege, we are going to concentrate on the minimal safety necessities wanted for an Amazon MWAA setting.
Safety teams: Minimizing permissions
Your Amazon MWAA setting could have a safety group related together with your VPC’s setting sources. This safety group can also be utilized by the ENIs created by the interface VPC endpoint that’s used to speak with the database and internet server. By default, safety teams deny all inbound visitors and safety group guidelines must be explicitly said, denoting the ports and supply that the occasion will enable community visitors from. At a minimal, the Amazon MWAA setting should enable for visitors to and from the Amazon Aurora PostgreSQL-Appropriate Version metadata database that’s owned and managed by Amazon MWAA. The metadata database is an important part of Airflow that acts as a centralized supply of reality for process execution, configuration, and monitoring. Each the scheduler and staff require entry to this database to carry out their respective roles in orchestrating and working duties. This database listens on TCP port 5432. Moreover, the net server visitors could be restricted to HTTPS via TCP port 443. At a minimal, the Amazon MWAA safety group will need to have the 2 inbound guidelines, detailed within the following desk.
Kind | Protocol | Port Vary | Supply Kind | Supply |
Customized TCP | TCP | 5432 | Customized | sg-xxxxx / my-mwaa-vpc-security-group |
HTTPS | TCP | 443 | Customized | sg-xxxxx / my-mwaa-vpc-security-group |
Many purchasers produce other AWS sources residing in VPCs, to which the Amazon MWAA staff want entry. These sources could be granted community entry in a non-public routing configuration utilizing safety teams as properly. If the useful resource sits in the identical safety group, add an extra inbound rule with the port wanted. For instance, if an Amazon Redshift cluster sits in the identical safety group, add the next rule.
Kind | Protocol | Port Vary | Supply Kind | Supply |
Customized TCP | TCP | 5439 | Customized | sg-xxxxx / my-mwaa-vpc-security-group |
If the Redshift cluster is in a distinct safety group, change the supply to the Redshift safety group.
Kind | Protocol | Port Vary | Supply Kind | Supply |
Customized TCP | TCP | 5439 | Customized | sg-xxxxx / redshift-security-group |
If the sources are in one other VPC, then VPC peering have to be enabled earlier than referencing that different VPC’s safety group. For sources that don’t reside in a subnet, a VPC endpoint may even present non-public routing to and from the Amazon MWAA setting and people sources. For instance, a VPC endpoint for Amazon Easy Storage Service (Amazon S3) can present enhanced safety, improved efficiency, and decrease prices.
Community ACLs: Minimizing permissions
Community ACLs can handle (by enable or deny guidelines) inbound and outbound visitors on the subnet stage. An ACL is stateless, which implies that inbound and outbound guidelines have to be specified individually and explicitly. It’s used to specify the forms of community visitors which are allowed in or out from the situations in a VPC community.
Each Amazon VPC has a default ACL that enables all inbound and outbound visitors, with a rule as follows.
Rule quantity | Kind | Protocol | Port Vary | Supply | Enable/Deny |
100 | All IPv4 visitors | All | All | 0.0.0.0/0 | Enable |
* | All IPv4 visitors | All | All | 0.0.0.0/0 | Deny |
You may edit the default ACL guidelines or create a customized ACL and fasten it to your subnets. A subnet can solely have one ACL connected to it at any time, however one ACL could be connected to a number of subnets. To implement least privilege in your Amazon MWAA setting, prohibit the inbound ACL to permit visitors from the metadata database and internet server and prohibit the outbound to permit visitors to solely the shoppers within the non-public subnet. Word the next examples use instance non-public IPs for the subnets used.
Inbound NACL
Rule quantity | Kind | Protocol | Port Vary | Supply | Enable/Deny | Feedback |
100 | Customized TCP | TCP | 5432 | 10.192.21.0/16 | Enable | Enable inbound database visitors from non-public subnet |
110 | HTTPS | TCP | 443 | 10.192.21.0/16 | Enable | Enable inbound HTTPS visitors from non-public subnet |
* | All visitors | All | All | 0.0.0.0/0 | Deny | Denies all inbound IPv4 visitors not already dealt with by a previous rule (not modifiable) |
Outbound NACL
Rule quantity | Kind | Protocol | Port Vary | Supply | Enable/Deny | Feedback |
100 | Customized TCP | TCP | 1024-65535 | 10.192.21.0/24 | Enable | Permits outbound return IPv4 visitors to shoppers in non-public subnet |
* | All visitors | All | All | 0.0.0.0/0 | Deny | Denies all outbound IPv4 visitors not already dealt with by a previous rule (not modifiable) |
VPC endpoints: Minimizing permissions
Whenever you create an Amazon MWAA setting, it’s deployed inside a VPC. This lets you management the community entry and safety of your Airflow deployment. Nonetheless, some buyer workloads executing within the Amazon MWAA setting may have to orchestrate duties utilizing different AWS companies, similar to Amazon S3 to entry information, AWS Glue to begin ETL (extract, rework, and cargo) jobs, or Amazon Redshift for working information warehouse queries, which reside exterior of your VPC. To determine a safe and personal connection between your Amazon MWAA setting and these exterior AWS companies, you should utilize VPC endpoints. The aim of VPC endpoints in Amazon MWAA is to supply a safe and personal connection between your Amazon MWAA setting and different AWS companies inside your VPC. VPC endpoints are digital gadgets which are provisioned inside your VPC and act as an entry level for the required AWS service, permitting your Amazon MWAA setting to speak with the service utilizing a non-public IP deal with, without having to undergo the general public web. The next diagram illustrates this structure.
VPC endpoints help you preserve your Amazon MWAA setting’s community visitors throughout the AWS community, lowering the publicity to the general public web and enhancing the general safety of your Airflow deployment. Though non-public VPC endpoints are robotically created for the database and internet server, to create a least privileged setting with out web entry, extra VPC endpoints can be wanted for the extra Amazon MWAA required sources. Amazon S3, Amazon Easy Queue Service (Amazon SQS), Amazon CloudWatch, and optionally AWS Key Administration Service (AWS KMS) will want VPC endpoints created. For extra particulars, see Creating the required VPC service endpoints in an Amazon VPC with non-public routing. Exterior of the mandatory companies, many purchasers run Amazon MWAA workflows that orchestrate extra AWS companies, similar to Amazon Redshift, Amazon EMR, and AWS Glue. Let’s take a look at an instance VPC endpoint that we need to use to hook up with Amazon Redshift, which is usually known as within the Airflow DAGS utilizing the Redshift Operator for workflows that work together with Amazon Redshift as an information warehouse. For extra data on creating Amazon VPC interface endpoints, see Entry an AWS service utilizing an interface VPC endpoint.
Create a VPC endpoint
Full the next steps to create a VPC endpoint utilizing Amazon Digital Non-public Cloud (Amazon VPC):
- On the Amazon VPC console, create a brand new VPC endpoint for the
amazonaws.area.redshift
service, the placearea
is the AWS Area the place your Amazon MWAA setting and Redshift cluster are situated. Be sure that non-public DNS is enabled. - Create a VPC endpoint coverage. This can be utilized to restrict entry to the Redshift cluster solely to the Amazon MWAA setting, stopping unauthorized entry from different sources. The next is an instance coverage:
- The
Model
discipline specifies the coverage language model. - The
Assertion
part comprises a single assertion that enables the required actions on the Redshift cluster. - The
Impact
discipline is ready to Enable, which suggests the coverage grants the required permissions. - The
Principal
discipline specifies the AWS Identification and Entry Administration (IAM) position related together with your Amazon MWAA execution position, which is allowed to entry the Redshift cluster. - The
Motion
discipline lists the precise Redshift actions that the Amazon MWAA execution position is allowed to carry out, similar to describing the cluster, getting cluster credentials, and restoring from a snapshot. - The
Useful resource
discipline specifies the Amazon Useful resource Title (ARN) of the Redshift cluster that the coverage applies to.
- Affiliate the VPC endpoint with the proper route desk. This route desk ought to be utilized by the subnets the place your Amazon MWAA setting is deployed. If utilizing a VPC interface endpoint, affiliate the endpoint with the 2 non-public subnets and safety group utilized by Amazon MWAA.
- Be sure that the safety teams related to the Amazon MWAA setting and the Redshift cluster enable the mandatory inbound and outbound visitors between them. This sometimes consists of permitting entry on the Redshift port (sometimes 5439) from the Amazon MWAA setting’s safety group.
- On the Amazon MWAA console, underneath Admin, Connections, replace the Redshift connection particulars to make use of the VPC endpoint deal with as a substitute of the general public Redshift endpoint. This makes certain that the connection between Amazon MWAA and Amazon Redshift is safe and stays throughout the VPC.
By configuring VPC endpoints for the AWS companies your Amazon MWAA setting must entry, you may present safe, non-public, and environment friendly communication between your Airflow deployment and AWS sources.
Proscribing visitors inside AWS with a buyer managed endpoints for Amazon MWAA sources
As talked about earlier, Amazon MWAA integrates with numerous AWS companies, similar to CloudWatch for logging, Amazon S3 for DAGs and necessities, Amazon SQS as a messaging middleware, and optionally AWS KMS for encryption. You may create VPC endpoints for these companies to ensure visitors stays throughout the AWS community. Entry to those endpoints could be restricted by permitting solely the Amazon MWAA safety group because the ingress supply. For particulars on how one can create these endpoints and insurance policies, see Introducing shared VPC assist on Amazon MWAA. If the Amazon MWAA setting was up to date after April 2, 2024, it is going to be on AWS Fargate v1.4 and won’t use Amazon Elastic Container Registry (Amazon ECR) and subsequently you’ll not have to create a VPC endpoint for it.
Managing permissions to deploy an Amazon MWAA setting
To create and deploy an Amazon MWAA setting, it is advisable have the suitable permissions granted to your IAM person or position. The required permissions could be granted via an IAM coverage connected to your person or position. Whenever you create an Amazon MWAA setting, you may specify an execution position that can be assumed by the Airflow staff to carry out duties. The execution position ought to have the mandatory permissions to entry the required AWS companies and sources primarily based in your workflow necessities. It’s essential to comply with the precept of least privilege when granting permissions to IAM roles and customers. You need to solely grant the minimal permissions required to your Amazon MWAA setting and Airflow workflows to perform appropriately.
Amazon MWAA belief coverage
Amazon MWAA wants to have the ability to assume the execution position with a view to carry out actions in your behalf. To do that, create a belief coverage, permitting the Amazon MWAA service the flexibility to AssumeRole
. To keep away from the confused deputy drawback, we add a situation to the belief coverage, and exchange the AWS account quantity and Area as wanted. The next is an instance coverage:
VPC endpoint permissions for the deployer position
Though the service-linked position creates the VPC endpoints, the deployer position requires permissions to create VPC endpoints and carry out a dry run. You may restrict these permissions by permitting the ec2:CreateVpcEndpoint
motion and specifying useful resource ARNs for VPC endpoints, VPCs, subnets, and safety teams. Moreover, you should utilize the aws:CalledVia
situation key to limit entry to the airflow.amazonaws.com
service.
Amazon MWAA execution position: Required permissions
When creating an Amazon MWAA setting, it is advisable specify an execution position that grants the mandatory permissions for Airflow to work together with different AWS companies. As a substitute of utilizing a wildcard coverage, you may create a customized coverage with the minimal required permissions.
The next is an instance of an execution position coverage that enables Amazon MWAA to work together with numerous companies utilizing an AWS managed key:
This coverage grants Amazon MWAA the mandatory permissions to work together with CloudWatch Logs, Amazon S3, Amazon SQS, and AWS KMS when utilizing the AWS managed key providing, whereas explicitly specifying the sources it might probably entry. You may additional refine this coverage primarily based in your particular necessities.
The next is an instance of an execution coverage that enables Amazon MWAA to work together with numerous companies utilizing a KMS buyer managed key:
For the use case of utilizing the shopper managed key, connect the next JSON coverage to the important thing to supply entry to the Airflow logs in CloudWatch Logs:
You may connect a number of insurance policies to the execution position as wanted to permit your staff to entry extra AWS sources. For instance, let’s discover how one can allow Amazon EMR entry. You may create a JSON coverage that comprises the narrowest permissions you may configure, as within the following instance:
Conclusion
On this submit, we mentioned greatest practices for least privilege configuration in Amazon MWAA. By following these approaches, you may adhere to the precept of least privilege and preserve a safe posture inside your Amazon MWAA setting, with out compromising performance or counting on overly permissive insurance policies. Safety is at all times high precedence; to study extra about safety in Amazon MWAA, see Safety in Amazon Managed Workflows for Apache Airflow and Safety greatest practices on Amazon MWAA.
Concerning the Authors
Elizabeth Davis is a Sr Options Architect at Amazon Internet Providers (AWS). She at the moment works with instructional expertise firms and has a ardour for serverless and information orchestration applied sciences. She has been an Amazon MWAA as an issue knowledgeable (SME) for the final 3+ years.
Mark Richman is a Principal Options Architect at Amazon Internet Providers with 30 years of expertise constructing advanced internet and enterprise software program. He contributes to Apache Airflow, bringing his experience in cloud computing and serverless applied sciences to the open-source platform. Mark can also be an completed author and speaker who has authored business publications and AWS programs whereas frequently presenting at trade occasions.