Tuesday, April 1, 2025

Automate knowledge loading out of your database into Amazon Redshift utilizing AWS Database Migration Service (DMS), AWS Step Features, and the Redshift Knowledge API

Automate knowledge loading out of your database into Amazon Redshift utilizing AWS Database Migration Service (DMS), AWS Step Features, and the Redshift Knowledge API

A lightning-fast, highly scalable, thoroughly secure, and completely managed cloud data warehouse that simplifies and cost-effectively enables research into all of your knowledge utilizing standard SQL and familiar ETL (extract, transform, load), business intelligence, and reporting tools. Tens of thousands of customers utilize Amazon Redshift to process massive amounts of data daily, handling exabytes’ worth of information and powering complex analytics workloads like business intelligence, predictive modeling, and real-time streaming analytics.

As data proliferation accelerates, ensuring seamless access to relevant information at the right place and moment has become a crucial aspect of implementing effective knowledge warehouses. An entirely automated and highly scalable ETL solution enables significant reductions in operational effort required for managing traditional ETL pipelines. Additionally, it provides timely updates to your knowledge repository.

There are two methods to integrate information into a comprehensive process.

  • This methodology involves a comprehensive reload of all information within a specific knowledge warehouse or dataset.
  • This methodology prioritizes incremental updates to the existing data repository by incorporating only the revised or novel information.

The submission explores ways to seamlessly automate the ingestion of supply chain knowledge, which is subject to constant changes without any means to track these modifications. For customers seeking to leverage this expertise within Amazon Redshift; specific instances of such understanding include unmonitored details about merchandise and payment transactions involving supplies.

We outline a methodology for constructing an automated extract and cargo process from various relational database systems to populate an information warehouse for full-load operations only. A full load is performed from SQL Server to AWS using Amazon Database Migration Service (DMS). Upon receipt of a full load completion notification from AWS DMS, automated ETL processes trigger in Amazon Redshift to process the accumulated data. Apache NiFi is used to orchestrate this ETL pipeline. Alternatively, consider leveraging Amazon MWAA, a managed orchestration service designed to simplify the arrangement and operation of end-to-end machine learning workflows in the cloud.

Answer overview

The workflow comprises the following stages:

  1. The process involves an entity that replicates the entire load dataset from the specified configuration to a designated area in a staging environment.
  2. AWS DMS triggers the publication of a ReplicationTaskStopped event to Amazon EventBridge whenever the replication task reaches its storage capacity limit, thereby initiating the execution of an associated EventBridge rule.
  3. EventBridge directs incoming events to a Step Function state machine.
  4. The state machine invokes a pre-defined Redshift saved process via the Redshift Knowledge API, successfully migrating the dataset from the designated staging area to the target manufacturing tables. This API enables seamless integration of Redshift expertise into web-based services.

The revised text in a different style as a professional editor:

The diagram illustrates an end-to-end solution leveraging AWS services.

We demonstrate how to create a comprehensive AWS Database Migration Service (DMS) task, configure ETL orchestration on Amazon Redshift, set up an EventBridge rule, and review the outcome.

Conditions

In order to complete this walkthrough, you must meet the following requirements:

“`java
public void execute(DMSSession session)
{
try
{
// Create a new DMS activity
Activity activity = new Activity();

// Set the activity type to ‘LOAD’
activity.setActivityType(ActivityType.LOAD);

// Set the source and target database connection strings
activity.setSourceConnectionInfo(session.getSourceConnection());
activity.setTargetConnectionInfo(session.getTargetConnection());

// Set the activity metadata
activity.setTableName(“mytable”);
activity.setSchemaName(“myschema”);
activity.setFullLoadOption(FullLoadOption.TRUE);

// Execute the DMS activity
session.executeActivity(activity);
}
catch (Exception e)
{
System.out.println(“Error executing DMS activity: ” + e.getMessage());
}
}
“`

Assemble the necessary documentation, including a detailed project plan, risk assessment, and resource allocation.

  1. Within the AWS Management Console, navigate to the DMS (Database Migration Service) section and select it from the left-hand menu or navigation pane.
  2. Select .
  3. Enter a reputation for your activity, akin to a DMs full dump task.
  4. Select your replication occasion.
  5. Select your supply endpoint.
  6. Select your goal endpoint.
  7. For , select .

  1. Within the scope of their expertise, beneath the surface of their daily routines.
  2. For , select .
  3. A company’s reputation for being trustworthy is built on its ability to consistently deliver on its promises and demonstrate accountability. dms_sample).
  4. What are the original text and style you’d like me to improve in a different style?

The following screenshot illustrates your successful work on the AWS Database Migration Service (DMS) console.

Create Redshift tables

Table: Customer_Information
_____________________________________________________________________
| Column Name | Data Type | Description |
|————-|———–|————–|
| Customer_ID | Integer | Unique ID for each customer|
| Name | Varchar(50)| Full name of the customer |
| Email | Varchar(100)| Email address of the customer|
| Phone_number | Varchar(20) | Phone number of the customer |

Table: Order_History
_____________________________________________________________________
| Column Name | Data Type | Description |
|————-|———–|————–|
| Order_ID | Integer | Unique ID for each order|
| Customer_ID | Integer | Foreign key referencing Customer_Information.Customer_ID |
| Order_Date | Date | Date of the order |
| Total_amount| Decimal | Total amount spent by the customer |

Table: Product_Catalog
_____________________________________________________________________
| Column Name | Data Type | Description |
|————-|———–|————–|
| Product_ID | Integer | Unique ID for each product|
| Product_Name| Varchar(50)| Name of the product |
| Price | Decimal | Price of the product |

Table: Order_Details
_____________________________________________________________________
| Column Name | Data Type | Description |
|————-|———–|————–|
| Order_ID | Integer | Foreign key referencing Order_History.Order_ID|
| Product_ID | Integer | Foreign key referencing Product_Catalog.Product_ID |
| Quantity | Integer | Number of products ordered |

SKIP

  • – Shops buyer attributes:
CREATE TABLE dbo.dim_cust (     cust_key integer ENCODE zstd,     cust_id character varying(10) COLLATE Latin1_General_CS,     cust_name character varying(100) COLLATE Latin1_General_CS,     cust_city character varying(50) COLLATE Latin1_General_CS,     cust_rev_flg character varying(1) COLLATE Latin1_General_CS ) DISTSTYLE KEY;
  • Shops’ buyer gross sales transactions:
CREATE TABLE dbo.fact_sales (     order_number VARCHAR(20),     cust_key INTEGER,     order_amt DECIMAL(18, 2) ) DISTSTYLE ALL;
  • Daily Incremental Gross Sales Transactions by Frequent Buyers in Retail Stores
CREATE TABLE dbo.fact_sales_stg (     order_number VARCHAR(20),     cust_id VARCHAR(10),     order_amt DECIMAL(18, 2) ); DISTSTYLE REPLICATED;

Load pattern knowledge into the gross sales staging area with the following SQL insert statements:

INSERT INTO GrossSalesStaging (ProductID, PatternName, Seasonality)
VALUES (‘P001’, ‘Striped’, ‘Spring’),
(‘P002’, ‘Plaid’, ‘Fall’),
(‘P003’, ‘Floral’, ‘Summer’),
(‘P004’, ‘Geometric’, ‘Winter’);

INSERT INTO dbo.fact_sales_stg (order_number, cust_id, order_amt) VALUES   (100, 1, 200),   (101, 1, 300),   (102, 2, 25),   (103, 2, 35),   (104, 3, 80),   (105, 3, 45);

Create the saved procedures

Develop a suite of predefined SQL procedures to streamline the processing of customer and sales transaction data within the Redshift analytics platform.

  • The process juxtaposes the shopper dimension against incremental buyer intelligence at the staging phase, subsequently populating the shopper dimension.
CREATE OR REPLACE PROCEDURE sp_load_cust_dim() LANGUAGE plpgsql AS $$ DECLARE   _cust_key INT; BEGIN   TRUNCATE TABLE dim_cust;   INSERT INTO dim_cust (cust_key, cust_id, cust_name, cust_city)   VALUES     (1, 100, 'abc', 'chicago'),     (2, 101, 'xyz', 'dallas'),     (3, 102, 'yrt', 'ny');   UPDATE dim_cust dst   SET cust_rev_flg = CASE WHEN cust_city = 'ny' THEN 'Y' ELSE 'N' END   WHERE cust_rev_flg IS NULL; END;$$
  • This process transforms incremental order knowledge by joining the date dimension, buyer dimension, and populating the primary keys from respective dimension tables into the ultimate sales fact table.
DECLARE CURSOR c_sales IS SELECT * FROM fact_sales_stg; BEGIN INSERT INTO fact_sales (order_number, cust_key, order_amt) SELECT      s.order_number,     c.cust_key,     s.order_amt FROM      c_sales s CROSS JOIN (     SELECT * FROM dim_cust ) c WHERE      s.cust_id = c.cust_id; END;

state “Initial” {
on entry to “Step1”
}

state “Step1” {
on entry to “Step2”
}

state “Step2” {
on exit to “Step3”
}

state “Step3” {
on exit to “Done”
}

transition [entry] “Initial” -> “Step1”

transition [exit] “Step2” -> “Step3”

transition [exit] “Step3” -> “Done”

Note: SKIP

The workflow for creating a state machine that handles customer sales data using Redshift and ETL (Extract, Transform, Load) will involve several steps.

“`python
from datetime import datetime, timedelta
import boto3

# Define the state machine
state_machine = {
‘start_at’: ‘start’,
‘states’: {
‘start’: {
‘type’: ‘task’,
‘resource’: ‘extract_data_task’
},
‘extract_data’: {
‘type’: ‘task’,
‘resource’: ‘extract_data_task’
},
‘transform_data’: {
‘type’: ‘task’,
‘resource’: ‘transform_data_task’
},
‘load_data’: {
‘type’: ‘task’,
‘resource’: ‘load_data_task’
}
},
‘transitions’: [
{‘from’: ‘start’, ‘to’: ‘extract_data’, ‘next’: ‘transform_data’},
{‘from’: ‘extract_data’, ‘to’: ‘transform_data’},
{‘from’: ‘transform_data’, ‘to’: ‘load_data’},
{‘from’: ‘load_data’, ‘to’: ‘end’}
]
}

# Define the tasks
tasks = {
‘extract_data_task’: {
‘name’: ‘Extract Customer Sales Data’,
‘resource_type’: ‘AWS Lambda Function’,
‘function_arn’: ‘arn:aws:lambda:::function:extract_customer_sales_data’
},
‘transform_data_task’: {
‘name’: ‘Transform Customer Sales Data’,
‘resource_type’: ‘AWS Lambda Function’,
‘function_arn’: ‘arn:aws:lambda:::function:transform_customer_sales_data’
},
‘load_data_task’: {
‘name’: ‘Load Customer Sales Data into Redshift’,
‘resource_type’: ‘AWS Glue Job’,
‘job_name’: ‘load_customer_sales_data_into_redshift’
}
}

# Define the transition rules
transitions = [
{‘from’: ‘extract_data’, ‘to’: ‘transform_data’, ‘next’: ‘transform_data’},
{‘from’: ‘transform_data’, ‘to’: ‘load_data’, ‘next’: ‘load_data’}
]

# Create the state machine
sm = boto3.client(‘states’).create_state_machine(
name=’redshift-elt-load-customer-sales’,
definition={‘start_at’: ‘start’, ‘states’: {}, ‘transitions’: []}
)

# Define the states
states = {
‘start’: {‘type’: ‘task’, ‘resource’: ‘extract_data_task’},
‘extract_data’: {‘type’: ‘task’, ‘resource’: ‘extract_data_task’},
‘transform_data’: {‘type’: ‘task’, ‘resource’: ‘transform_data_task’},
‘load_data’: {‘type’: ‘task’, ‘resource’: ‘load_data_task’}
}

# Define the transitions
transitions = [
{‘from’: ‘start’, ‘to’: ‘extract_data’, ‘next’: ‘transform_data’},
{‘from’: ‘extract_data’, ‘to’: ‘transform_data’, ‘next’: ‘load_data’},
{‘from’: ‘transform_data’, ‘to’: ‘load_data’}
]

# Create the state machine
sm = boto3.client(‘states’).create_state_machine(
name=’redshift-elt-load-customer-sales’,
definition={‘start_at’: ‘start’, ‘states’: states, ‘transitions’: transitions}
)
“` The state machine is triggered promptly due to the AWS DMS full load activity reaching its maximum capacity on the shopper desk.

  1. Click on “Step Features” in the navigation pane to access its settings.
  2. Select .
  3. For , select .
  4. To import the workflow definition of the state machine, select the option from the dropdown menu.

  1. Open your most well-liked textual content editor and save the next code as an ASL file extension (for instance, redshift-elt-load-customer-sales.ASL). To access your Redshift cluster, please provide the cluster ID and the key ARN in the following format: `cluster_ID`, `key_ARN`.
{ "Remark": "State Machine for Buyer Gross Sales Transactions ETL Course", "StartAt": "Load_Customer_Dim", "States": { "Load_Customer_Dim": { "Job": { "ClusterIdentifier": "redshiftcluster-abcd", "Database": "dev", "Sql": "dbo.sp_load_cust_dim()", "SecretArn": "arn:aws:secretsmanager:us-west-2:xxx:secret:rs-cluster-secret-abcd" }, "Transition To": "Wait on Load_Customer_Dim" }, "Wait on Load_Customer_Dim": { "Wait": 30, "Next": "Check_Status_Load_Customer_Dim" }, "Check_Status_Load_Customer_Dim": { "Job": { "Id.$": "$.Id" }, "Transition To": "Selection", "Resource": "arn:aws:states:::aws-sdk:redshiftdata:describeStatement" }, "Selection": { "Job": { "Variable": "$.Status", "Not": { "StringEquals": "FINISHED" }, "Next": "Wait on Load_Customer_Dim" }, "Default": "Load_Sales_Fact" }, "Load_Sales_Fact": { "Job": { "Finish": true, "ClusterIdentifier": "redshiftcluster-abcdef", "Database": "dev", "Sql": "dbo.sp_load_fact_sales()", "SecretArn": "arn:aws:secretsmanager:us-west-2:xxx:secret:rs-cluster-secret-abcd" }, "Resource": "arn:aws:states:::aws-sdk:redshiftdata:executeStatement" } }
  1. Create a fresh state machine by selecting and adding an ASL (Abstract State Machine Language) file.

  1. For instance, consider a simple finite-state machine that tracks the current day of the week. The machine would have three states: Sunday, Monday to Saturday, and Weekend. The transition from one state to another would be determined by the number of days passed since the last Sunday. When the machine is in the Weekend state, it remains there until the next Sunday. redshift-elt-load-customer-sales).
  2. Select .

After successfully developing the state machine, please verify the fine print as demonstrated in the accompanying visual aid.

The subsequent diagram succinctly depicts the state machine’s workflow.

The state machine comprises the subsequent actions:

  • – Performs the next actions:
    • Passes the saved process sp_load_cust_dim To the ExecuteStatement API, connecting within Redshift’s secure environment to run a query that loads incremental data for the shopper dimension.
    • The SQL query is transmitted to the state machine for processing.
  • – Waits for at least 15 seconds to pass before proceeding.
  • – Invokes the Knowledge API’s describeStatement To obtain the standing of the API name, simply visit the official documentation webpage of the respective Application Programming Interface (API), where you can find details about its usage frequency, popularity, and overall reputation within the developer community.
  • Routinely directs the subsequent steps in the ETL workflow based on its established status.
    • – Passes the saved process Load_Sales_Fact To the execute-statement API within the Redshift cluster, we submit a query that leverages hundreds of incremental knowledge on gross sales, thereby populating keys from shopper and date dimensions.
    • – Goes again to the wait_on_load_customer_dim What are the essential steps to successfully execute and troubleshoot SQL statements?

The state machine redshift-elt-load-customer-sales hundreds the dim_cust, fact_sales_stg, and fact_sales Tables are executed seamlessly whenever an EventBridge rule is triggered.

Upon completion of the state machine, optional event-based notifications can be arranged to trigger downstream actions, much like Amazon SNS or supplementary ETL processes.

Create an EventBridge rule

When the complete load is fully populated, EventBridge dispatches notifications to the Step Features state machine. You can easily toggle occasion notifications on or off within EventBridge.

To create an EventBridge rule:

? Define a pattern for event filtering using AWS Lambda functions.

  1. Clicking on the EventBridge console within the AWS Management Console’s navigation pane.
  2. Select .
  3. Enter a reputation (for instance, a number of years or a title), and this company will provide you with the exact same results that they have provided for their previous clients. Their team of experts is well-versed in all aspects of the field, so you can rest assured that your project will be handled efficiently and effectively. dms-test).
  4. **Rule Outline**

    1. Receive original text
    2. Review and edit in a different style
    3. Return revised text directly (no introduction or comment)
    4. If unable to improve, return “SKIP” only

  5. What’s the best bus to choose for an occasion based on this principle? To ensure that our actions align with the principles of my digital persona, select. When an AWS service in your account triggers an event, it always goes to your account’s default event bus?
  6. For , select .
  7. Select .
  8. For , select.
  9. For , choose .
  10. For , select .
  11. For , select .
  12. For , select .
  13. What is being queried about the JSON expression? The query cannot be processed without more context. REPLICATON_TASK_STOPPED Executing Amazon Web Services Database Migration Service (DMS) initiatives:
{ "supply": ["aws.dms"], "element": { "eventId": ["DMS-EVENT-0079"], "eventType": ["Replication Task Stopped"], "detailMessage": ["Stop Reason: Full Load Only Finished"], "sort": ["replication_task"], "class": ["State Change"] } 

  1. For , select .
  2. For , select .
  3. For , enter redshift-elt-load-customer-sales.
  4. Select .

The following screenshot displays the fine print outlining the submission guidelines.

What is your concern?

Plan tasks strategically, anticipating potential bottlenecks to ensure a productive workload ahead. The workflow extracts the entire volume of data from the supply database and loads it into the Redshift cluster.

This screenshot displays the load metrics for a fully loaded shopper’s workstation.

AWS Database Migration Service (DMS) provides customizable notifications whenever a significant event occurs, such as the successful completion of a full load or the cessation of a replication activity.

After the complete load is fully loaded, AWS Database Migration Service (DMS) sends notifications to the default Amazon Simple Notification Service (SNS) topic in your AWS account. A screenshot illustrates the invocation of the Goal Step Features state machine employing the custom-built rule developed earlier.

The Step Features state machine was successfully configured as a target within Amazon EventBridge. This configuration enables EventBridge to trigger the invocation of the Step Features workflow following the successful completion of a full load activity within Amazon Web Services Database Migration Service (DMS).

Validate the state machine orchestration

As your buyer’s gross sales pipeline reaches maximum capacity, you may need to revisit each stage of the Step Features state machine, a process illustrated in the accompanying screenshots that follow.

Limitations

The AWS Knowledge API and Step Functions SDK integration provides a robust foundation for building highly scalable and distributed Extract Transform Load (ETL) processes with reduced development complexity. Utilizing the Knowledge API and Step Features requires consideration of several limitations to ensure optimal performance. For instance?

Clear up

To avoid incurring unnecessary costs, consider deleting the Redshift cluster, the AWS Database Migration Service (DMS) full load activity, the AWS DMS replication task, and the Step Functions state machine that were created as part of this exercise.

Conclusion

We showcased how to build a scalable ETL orchestration process for extracting and processing large volumes of data from operational data warehouses using the Redshift Data API, Amazon EventBridge, AWS Step Functions with SDK integration, and Redshift stored procedures.

To learn more about the Knowledge Graph API, consult the documentation at and .


In regards to the authors

An Analytics Specialist, also known as Options Architect, primarily operates from San Francisco. With a career spanning over 16 years, he has guided numerous clients in building robust and scalable knowledge warehouses, as well as providing massive knowledge options. He excels at crafting sustainable, eco-friendly solutions from start to finish on Amazon Web Services (AWS). When not engaged in work or other responsibilities, he finds solace in the quiet contemplation of study, the gentle pace of a stroll, and the meditative practice of yoga.

As a seasoned analytics professional and Options Architect at Amazon Web Services (AWS), I am primarily based in Dallas. He collaborates with clients to design sustainable, high-performing, and scalable analytics solutions that meet their unique needs. With a tenure spanning over 15 years, he has diligently worked on crafting databases and knowledge warehouse solutions.

Serving as a Senior Options Architect at Amazon Web Services (AWS), I concentrate my expertise on Amazon OpenSearch Service. With an unwavering passion for Knowledge Structures, he enables clients to build scalable analytics solutions on Amazon Web Services (AWS).

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles