Thursday, April 3, 2025

What are the most cost-effective pricing strategies for managing Amazon Managed Workflows (MWAA) to maximize resource utilization and minimize costs?

Amazon Managed Workflow (MWAA) is a fully managed service allowing you to orchestrate large-scale information pipelines and workflows with ease. With Amazon Managed Workflow for Apache Airflow (MWAA), you can define Directed Acyclic Graphs (DAGs) that outline your workflows without being encumbered by the administrative overhead of scaling the underlying infrastructure. Our platform provides guidance on how to maximize productivity and reduce expenses through the implementation of best practices.

Amazon’s managed workflows as code (MWAA) environments comprise four essential Airflow components deployed across teams of AWS compute resources: a scheduler, which orchestrates tasks; workers that execute them; a web server providing a user interface; and a metadata database tracking the workflow’s state. Optimizing prices while maintaining value and efficiency becomes crucial for handling diverse or unpredictable workloads?

This best-practices guide provides actionable insights on how to achieve cost optimisation and environmentally sustainable performance in Amazon Managed Workflow for Apache Airflow (MWAA) environments, supplemented by comprehensive explanations and real-world scenarios. While not essential to apply all best practices to an Amazon MWAA workload, you can choose and implement specific strategies suited to your unique requirements.

Proper-sizing your Amazon MWAA atmosphere

Ensuring optimal performance on Amazon Managed Workloads for Analytics (MWAA), it is crucial to properly size your environment to support concurrent scaling across various workloads, thereby delivering exceptional price-performance efficiency. The Amazon MWAA (Managed Workflows for Apache Airflow) atmosphere chosen by you determines the scope and diversity of concurrent tasks that can be handled simultaneously by worker nodes. In Amazon Managed Apache Airflow (MWAA), you may choose from five distinct workflow templates, each optimized for specific use cases. As we delve into the process of optimizing your Amazon Managed Workflow for Apache Airflow (MWAA) environment, let’s explore the key steps to help you right-size it effectively.

Monitor useful resource utilization

To effectively optimize your Amazon Managed Workflow for Apache Airflow (MWAA) environment, start by monitoring and optimizing the current resource utilization. You can potentially monitor the fundamental aspects of your surroundings using this technology, which aggregates raw data and transforms it into easily digestible, near-real-time metrics for informed decision-making. By leveraging these atmospheric metrics, you’ll gain enhanced transparency into crucial efficiency benchmarks that enable informed evaluation of your environments and pinpoint areas for improvement within your workflows. Given the concurrent duties, we may adjust the atmospheric measurements and maximum/minimum staff required accordingly. CloudWatch provides detailed metrics on CPU and memory utilization for all underlying AWS resources utilized by Amazon Managed Workflows for Apache Airflow (MWAA). What specific performance metrics are available for measuring the effectiveness of an Amazon Managed Workflow for Apache Airflow (MWAA) instance? These metrics encompass the diversity of base employees, further employees, schedulers, and internet servers.

Analyze your workload patterns

As you delve into the intricacies of your process, you’ll uncover areas ripe for optimization. Investigate workflow structures, parallel task execution, and individual process durations. Continuously track CPU and memory usage patterns during periods of maximum demand. CloudWatch metrics and Airflow logs? How do these two seemingly disparate concepts intersect? Can we gain valuable insights into our pipeline’s performance by analyzing the former alongside the latter? Determine recurring responsibilities, hindrances, and labor-consuming processes to facilitate optimal space allocation. Determining the optimal resource utilization requirements of your workload enables informed decisions regarding the most suitable Amazon MWAA environment class to employ, thereby streamlining processing and optimizing performance.

Select the fitting atmosphere class

Ensure you match your necessities to the suitable instance sizes (from small to x-large) that can efficiently accommodate your workload requirements. You can potentially vertically scale up or scale down a current environment by using an API, the AWS CLI, or the SDK. A change within the atmospheric class necessitates a planned downtime schedule.

High quality tune configuration parameters

Optimizing workflow efficiency and achieving cost savings hinges on having high-quality tuning configurations in Apache Airflow, which are critical to ensure seamless execution of tasks. You can fine-tune settings mirroring Auto Scaling capabilities for parallel processing, logging, and DAG code optimization techniques.

Auto scaling

Amazon Managed Workloads (MWAA) enables automated scaling, dynamically adjusting the number of active employees and internet servers based on workload demands. You can configure the minimum and maximum number of Airflow workers that operate in your environment. The algorithm used for employee node auto-scaling in Amazon MWAA leverages RunningTasks and QueuedTasks metrics to calculate the required number of employees, specifically by dividing the sum of duties working plus duties queued by duties per employee. If the required number of employees exceeds the current workforce, Amazon MWAA will create additional employee slots up to the maximum value specified in the maximum employee configuration.

Amazon MWAA auto scaling will elegantly scale down when the demand for processing exceeds the minimum requirements. In a hypothetical Amazon MWAA environment, where the team size ranges from one to ten members, each team member’s capacity for task assignment is capped at 20 responsibilities. Every day, at precisely 8:00 AM, the Data Application Groups (DAGs) initiate a massive computational process, leveraging an astonishing 190 concurrent tasks to achieve their objectives. Amazon MWAA will automatically scale to approximately 10 employees, driven by the need for 190 requested tasks, with a ratio of 20 duties per employee, yielding 9.5 employees that must be rounded up to 10. As of 10:00 AM, we have completed only half our duties, which means there are still 85 tasks awaiting completion. Amazon’s Managed Workflow for Apache Airflow (MWAA) will scale down to approximately six employees, given the estimated 95 duties that can be assigned to each worker with only 20 duties per employee, resulting in a rough estimate of 5.25 employees, rounded up to six. All employees who may still be performing duties remain fully protected during downsizing until their employment is formally terminated, with no interruptions to their work. As a result of reduced queues and duties, Amazon’s Mechanical Workforce Automation Assistant (MWAA) will gradually eliminate employees without impacting operational efficiency, ultimately reaching the minimum staffing level required.

In Amazon, MWAA enables mechanical scaling of internet servers based on CPU usage and active connection counts, allowing for seamless adaptation to changing workloads. Amazon Managed Workflows (MWAA) ensures that your Airflow environment effortlessly scales to meet increased demands, regardless of whether driven by REST API requests, AWS CLI usage, or multiple concurrent users accessing the Airflow UI seamlessly. You can specify the utmost and minimum instance types while configuring your Amazon SageMaker Autopilot environment.

Logging and metrics

We discuss the procedures for selecting and configuring appropriate log settings and CloudWatch metrics.

Select the fitting log ranges

When enabled, Amazon Managed Workflow for Apache Airflow (MWAA) sends Airflow logs to Amazon CloudWatch? You can view logs to identify Airflow activity delays or workflow errors without requiring additional third-party tools. Enabling logging is crucial for monitoring Airflow’s DAG processing, duties, scheduler, web server, and employee logs. You can allow Airflow logs at the INFO, WARNING, ERROR, or CRITICAL level. When selecting a log level, Amazon MWAA sends logs for that level and all higher levels of severity. When applying commonplace processes, reducing log levels can make scaling more feasible by decreasing overall costs. Log level set to DEBUG for development and user acceptance testing (UAT), with warnings and errors logged separately as WARNING and ERROR for production, respectively?

Set acceptable log retention coverage

Unless otherwise configured, logs are typically retained indefinitely without expiration. To reduce CloudWatch costs, consider scaling back your usage by resizing your instances to smaller sizes or shutting down unused resources.

Select required CloudWatch metrics

You can select which Airflow metrics are sent to CloudWatch by using the Amazon MWAA configuration option. Discuss with the entire . Metrics equivalent to schedule delay and duration success are printed per DAG, while others, such as those related to task completion timestamps (ti.end), are printed separately for each activity within a DAG.

As a result, the aggregated count of DAGs and tasks directly influences your CloudWatch metric ingestion costs. To effectively manage CloudWatch pricing, consider publishing a curated set of key metrics. Henceforth, our publications will exclusively focus on metrics bearing the prefixes ‘scheduler’ and ‘executor’.

metrics.statsd_allow_list = scheduler,executor

We advocate utilizing metrics.statsd_allow_list with .

By leveraging regular expressions’ pattern-matching capabilities, a more effective approach involves using samples that match common expressions instead of relying solely on prefix matching at the beginning of an identifier.

Streamline monitoring of CloudWatch dashboards by setting up custom alarms that alert you to critical performance issues.

To create a CloudWatch alarm that monitors the health of an Amazon Managed Workflow for Apache Airflow (MWAA) environment, follow these steps:

You can monitor key performance indicators like workflow runs, airflow DB size, and so on in order to detect any anomalies or potential issues with your MWAA setup. Configuring alarms enables proactive monitoring of atmospheric conditions, allowing for timely interventions to maintain optimal air quality and environmental health.

You should consider implementing a retry mechanism for the AWS Secrets Manager API invocation to handle transient errors. This can be achieved by using an exponential backoff strategy, which will gradually increase the delay between retries until it reaches a maximum value. This approach not only improves the overall reliability of your application but also reduces the likelihood of overwhelming the AWS Secrets Manager with repeated requests.

Airflow provides a mechanism for securely storing secrets such as variables and connection details. By default, these secrets and techniques are stored within the Airflow metadata database. Airflow customers can optionally configure a centrally-managed location for secrets and techniques, akin to. When specified, Airflow will initially inspect this alternative secrets backend whenever a connection or variable is requested. When the alternative backend successfully captures the desired value, it is returned; otherwise, Airflow queries its metadata database to retrieve the value and returns that as an alternative. The number of API calls made to a Secrets Manager can significantly impact its pricing, one of several factors that influence overall cost.

In the Amazon Managed Workflows for Apache Airflow (MWAA) console, you can configure the backend Secrets Manager supervisor path for the connections and variables that will be utilized by Airflow. Airflow, by its default configuration, automatically searches for all defined connections and variables within the designated backend. To minimize the number of API calls that Amazon MWAA makes to Secret Manager on your behalf, consider configuring it. By specifying a sample, you narrow down the feasible paths that Airflow will explore. By leveraging Secrets Manager with Amazon Managed Warehouse for Analytics (MWAA), you can potentially lower your costs.

To effectively utilize a secrets and techniques cache, enable AIRFLOW_SECRETS_USE_CACHE With time-to-live (TTL) assistance to scale back the frequency of Secrets Manager API calls?

To filter specific connection subsets, variables, or configurations in Secret Manager, use the corresponding. *_lookup_pattern parameter. This parameter accepts a regular expression as a string. To look up connections starting with ‘m’ in Secret Manager, your configuration file should resemble the following code:

 secrets:    backend: airflow.suppliers.amazon.aws.secrets.SecretsManagerBackend   backend_kwargs:     connections_prefix: "airflow/connections"     connections_lookup_pattern: "^m"     profile_name: "default"

DAG code optimization

Schedulers and employees are key stakeholders that may be impacted by processing a Directed Acyclic Graph (DAG). Once the scheduler has processed the DAG and placed it in a designated waiting area, an available employee then retrieves the task from the queue. On a fundamental level, every employee is cognizant of the DAG_id and the accompanying Python file, along with additional relevant information. The employee needs to parse the Python file in order to execute their assigned tasks.

The DAG parsing process runs consecutively, initially triggered by the scheduler and subsequently executed by the employee. The employees’ task of parsing the DAG has a direct impact on the code’s execution time, thereby determining the optimal number of staff required, ultimately affecting labor costs.

Considering a total of 200 DAGs with 10 duties each, requiring 60 seconds to process each activity, we will proceed by calculating

  • All duties are completed across all 2,000 daily activity groups (DAGs).
  • Time per activity equals sixty seconds plus twenty seconds spent parsing a DAG.
  • Total completion time: 160,000 seconds
  • Total time per employee: 13 hours and 12 minutes.
  • The variety of employees’ desire for work-life balance is reflected in the ratio of total hours worked to hours worked per employee, which stands at approximately three.

Let’s accelerate the parsing of DAGs within a 100-second timeframe.

  • Complete duties throughout all Department of Agriculture and Grasslands (DAGs) operations, a total of approximately 2,000.
  • Time per activity is 160 seconds.
  • Total complete time: 3,200,000 seconds
  • Total hours per employee: 18.
  • The diversity of employees requires a more nuanced calculation: Total compensation divided by the number of full-time equivalent employees equals $320,000 ÷ 72,000 ≈ 4.44.

As the DAG parsing time increased from 20 seconds to 100 seconds, the number of employee nodes required rose from three to five, leading to a corresponding rise in computational costs.

To accelerate parsing of code, heed expert-adviced guidelines in the following sections.

Take away top-level imports

When the DAG is parsed, code imports are executed every time. To avoid cluttering the top-level code with library imports for creating DAG objects, consider relegating these imports to subordinate functions or modules instead of defining them at the outset. Once the outline is finalized and incorporated into the activity, the concept will become clear at the moment the task is executed.

Avoid numerous interactions with databases such as the metadata repository or external systems’ databases. Variables are employed within the Dataflow Application Graph (DAG), potentially stored in the metadata database or a backend system such as Secrets Manager. To ensure efficient processing of activities, consider implementing templating using Jinja, where variable population occurs dynamically during runtime rather than at parse-time. This approach enables the manipulation of variables without incurring unnecessary parsing overhead.

For instance, see the next code:

import pendulum from airflow import DAG from airflow.decorators import activity import numpy as np  # <-- DON'T DO THAT! with DAG(     dag_id="example_python_operator",     schedule=None,     start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),     catchup=False,     tags=["example"], ) as dag:     @activity()     def print_array():         """Print Numpy array."""         import numpy as np  # <-- INSTEAD DO THIS!         a = np.arange(15).reshape(3, 5)         print(a)         return a     print_array()

What kind of analysis requires manual inspection?

from airflow.decorators import task from airflow.utils.decorators import variable @task @variable('foo') def foo_task():     print(f"Hello, the value of foo is {foo}")

Writing DAGs

Complicated DAGs, characterized by numerous duties and intricate dependencies, can significantly impair the efficiency of scheduling processes. To sustain high-performing and optimally utilized Airflow instances, consider streamlining and refining your DAGs.

A directed acyclic graph (DAG) with a straightforward linear structure, such as A → B → C, typically exhibits minimal delays during activity scheduling due to its lack of complex dependencies and nesting. In contrast, a DAG featuring a deeply nested tree construction with an exponentially increasing number of dependent tasks is more likely to incur significant delays in scheduling activities.

Dynamic DAGs

The workflow is defined by a Directed Acyclic Graph (DAG), where hardcoded desk names are retrieved from a database to establish specific node relationships. A developer must outline N distinct types of directed acyclic graphs (DAGs) for N different tables within a database.

# Constructing Dynamic DAG Instances dag_params = getData() no_of_dags = int(dag_params["no_of_dags"]['N']) for quantity in range(int(no_of_dags)):     dag_id = f'dynperf_t1_{quantity}' default_args = {'proprietor': 'airflow',                  'start_date': datetime(2022, 2, 2, 12, quantity)}

To streamline operations and reduce the risk of mistakes, consider leveraging. The resulting DAG definition is generated following a query to a database catalog, which then dynamically spawns as many DAGs as there are tables within the database. This achieves the identical goal with much less code.

consumer = boto3.client('dynamodb') response = consumer.get_item(TableName='mwaa-dag-creation', Key={'key': {'S': 'mwaa'}}) return response['Item']

Stagger DAG schedules

The simultaneous execution of all Directed Acyclic Graphs (DAGs) within your environment can result in an increased number of necessary worker nodes needed to process tasks, ultimately driving up computational costs. To optimize resource allocation for non-time-critical DAG runs within an enterprise setting, consider strategically scheduling these tasks to ensure maximum utilization of available personnel resources.

DAG folder parsing

DAGs (directed acyclic graphs) of varying complexity can exist within a single Python file, while more intricate ones might span multiple files and necessitate the management of interdependent components for their proper functioning. You can achieve this by using an ordinary filesystem structure, where all required files are stored in separate directories within the designated root directory, or alternatively, package the entire DAG and its associated Python code into a single ZIP archive. Airflow scrutinizes all directories and contained data within the DAG_FOLDER. Using the `ignore` parameter in the Airflow configuration file allows you to specify directories or specific pieces of information that should be intentionally ignored by the platform. This revision aims to enhance the discoverability of directed acyclic graphs (DAGs) within a listing, thereby streamlining parsing scenarios and improving overall efficiency.

Deferrable operators

You can potentially run this application on Amazon Managed Workflow for Apache Airflow (MWAA). Operators with deferrability capabilities can autonomously adjust their functioning levels to release assigned personnel positions. Fewer responsibilities for employees can lead to a reduction in the number of personnel needed, potentially lowering labor costs.

Suppose we’re leveraging a multitude of sensors designed to detect specific events, thereby occupying crucial employee nodes. By leveraging deferrable sensors and auto-scaling capabilities to proactively downsize employee resources, organizations can rapidly reduce unnecessary employee instances, thereby minimizing personnel costs and optimizing workforce efficiency.

Dynamic Job Mapping

By leveraging Dynamic Job Mapping, workflows can efficiently generate numerous tasks in real-time, contingent upon current data, thereby eliminating the need for the DAG writer to anticipate and pre-define task requirements. While defining duties within a for loop may seem straightforward, it’s actually more efficient to leverage the capabilities of the scheduler by having it fetch information from a DAG file and execute tasks accordingly, taking into account the output of preceding activities. Before running a mapped activity, the scheduler creates N copies of the task, one for each entry.

Cease and begin the atmosphere

You can dynamically scale your Amazon MWAA environment according to your workload requirements, resulting in cost savings. You can either perform this action manually or automatically control the startup and shutdown of Amazon MWAA environments. Can we leverage AWS CloudFormation to script the creation and deletion of our Amazon MWAA environments while preserving critical metadata?

Conclusion

While implementing Amazon MWAA’s efficiency optimization best practices can significantly reduce overall costs while maintaining optimal performance and reliability. Methods for optimized environment management include refining the right-sizing of atmospheric conditions through CloudWatch metric-based lessons, controlling logging and monitoring expenditures, leveraging pattern lookups with Secret Manager, streamlining DAG code, and strategically suspending or restarting environments in response to shifting workload demands. Continuously refining and recalibrating these parameters in response to shifting workload demands optimizes your financial performance.


In regards to the Authors

As a seasoned Senior Options Architect at AWS, his expertise lies in guiding clients to reverse-engineer their enterprise goals and craft innovative solutions leveraging the power of AWS. Over the course of his career, he has successfully guided numerous clients in their information platform transformation initiatives across various industries and sectors. His core area of expertise encompasses knowledge, techniques, informational analysis, and informatics. In his free time, he appreciates participating in various sports, indulging in marathon TV show viewings, and playing the tabla.

As an Options Architect at AWS, she leverages her expertise in information analytics and generative AI to drive innovative solutions. She partners with clients to identify and address their enterprise challenges by designing innovative, data-driven solutions leveraging the latest technologies. She is passionate about crafting innovative, sustainable, and budget-friendly solutions that propel businesses forward through seamless digital transformations.

As a seasoned veteran in his role as Senior Options Architect at AWS, he leverages his extensive expertise to bridge the gap between cutting-edge innovations in AI/ML, serverless, and information analytics. With unbridled passion, he excels at designing tailored solutions that prioritize client safety, scalability, dependability, and financial prudence.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles