Serving as a rapid, petabyte-scale cloud data repository, this solution supports the analytics needs of tens of thousands of customers worldwide. Thousands of shoppers leverage Amazon Redshift’s data-sharing capabilities for seamless, granular, and rapid information exchange across provisioned clusters and serverless workgroups. This enables scalable learning workloads for thousands of concurrent customers without manual manipulation or data duplication.
Now, we’re announcing the general availability of Amazon Redshift’s multi-data warehouse write capabilities through seamless information sharing. This innovative feature enables users to efficiently manage their writing workloads by seamlessly scaling and allocating tasks across diverse warehouse configurations tailored to specific workload requirements. By strategically running your ETL jobs across disparate data warehouses with ease, you can achieve unparalleled predictability in mere clicks. The ability to manage and monitor costs for each data warehouse, combined with facilitating information sharing across various teams through collaborative writing to one another’s databases, presents numerous benefits. The information remains accessible and readily available across all warehouses, ensuring rapid retrieval regardless of whether it’s stored in a single account or spans multiple areas. To learn more about the benefits of using multiple warehouses to write down to similar databases, consult on multi-warehouse writes through data sharing.
As organisations transition workloads to AWS, they are also seeking mechanisms to manage costs efficiently? Having a profound comprehension of the operational costs associated with running an online business and the value it generates for the organization enables you to trust in the efficacy of your approach on Amazon Web Services (AWS).
This publication showcases how to create a business chargeback model by leveraging Amazon Redshift’s multi-warehouse architecture and data sharing capabilities. You can assign value to distinct business models and simultaneously gain more insights to inform sustainable investments.
Use case
To illustrate a practical use case, we consider AnyCompany, a retail firm that leverages Amazon Redshift provisioned clusters and serverless workgroups, each custom-designed to meet the unique needs of a specific business unit, such as sales, marketing, and development teams. AnyCompany, a large-scale enterprise group, has already successfully transitioned massive amounts of its corporate workloads to Amazon Redshift. Now, it is poised to further break down information barriers by migrating additional business-critical workloads onto the same platform. AnyCompany serves a highly specialized community of enterprise clients, seeking to maintain control over the pipelines that integrate their proprietary data with valuable business insights. To break down information silos and eliminate redundancies, the enterprise IT team requires all business units to access a unified, centralized database, thereby supporting better data governance under their oversight. Within the designated framework, each team is accountable for processing and integrating data before submitting it to either the same or distinct tables hosted in the centralized database. Groups will utilize their dedicated Redshift workgroup or cluster for computations, thereby allowing for distinct cost allocation to individual value facilities through transparent billing.
This section will guide you through a step-by-step process for leveraging multi-warehouse writes to push data into the same databases using information sharing, and developing an end-to-end enterprise chargeback model. This chargeback model serves as a valuable tool for assigning monetary worth to specific business frameworks, providing enhanced transparency into organizational expenditures, and facilitating the implementation of cost-effective strategies for maximum efficiency.
Answer overview
The diagram that follows outlines the framework for the solution.
The workflow comprises the following sequential steps:
- We separate ingestions from diverse sources by utilizing dedicated workgroups and a Redshift-provisioned cluster for efficient data processing.
- Producers individually create and populate data within a centralized ETL storage framework, each utilizing unique schema designs and table structures that cater to their specific needs. Within the ETL-provisioned cluster, the Gross sales and Advertising workgroups contribute distinct data entries: Gross sales team populates the Gross sales schema, while the Advertising team inputs information into their respective schema, ultimately storing their outputs within the cluster’s storage. They will further apply transformations to the schema object, tailoring it to meet their specific enterprise requirements.
- The Redshift Serverless producer workgroups and Redshift producer clusters can respectively insert and replace data within a standardized database.
ETL_Audit
Stored within the Audit schema of a primary ETL repository. - The identical Redshift Serverless workgroups and provisioned clusters, previously utilized for ingestion, are also leveraged for consumption, with distinct enterprise groups responsible for maintenance and invoicing separately.
To successfully deploy this framework, one must first establish a solid foundation by outlining the necessary prerequisites and defining the scope of the project.
- The production process began with arranging the first ETL cluster, which involved meticulously placing the components in a precise order to ensure seamless data flow. With precision and care, we carefully positioned the extract component, followed by the transformation module, and finally, the loading mechanism.
- Create the datashare
- To grant permissions on schemas and objects in a PostgreSQL database, you can use the GRANT command.
This command is used to give users or groups permission to perform specific actions within the database. The syntax for this command is as follows:
“`sql
GRANT privilege ON schema/object TO user/group;
“`For example, if you want to grant read and write permissions on a table called `my_table` in a schema called `my_schema` to a user named `john`, you can use the following command:
“`sql
GRANT SELECT, UPDATE ON my_schema.my_table TO john;
“`Similarly, if you want to grant read-only permission on an entire schema to a group called `admin_group`, you can use the following command:
“`sql
GRANT SELECT ON SCHEMA my_schema TO admin_group;
“` - The permissions have been granted to the Gross sales and Advertising shopper namespaces.
- The Gross Sales: Warehouse(Store) Inventory Management.
- The sales database was established to provide comprehensive information on gross sales, thereby allowing for in-depth analysis and informed decision making. The dataset comprises detailed records of all transactions conducted during the quarter, including product categories, quantities sold, and corresponding revenue figures.
SKIP
- The ETL and Gross Sales Data Share: A Comprehensive Overview?
SKIP
- The sales database was established to provide comprehensive information on gross sales, thereby allowing for in-depth analysis and informed decision making. The dataset comprises detailed records of all transactions conducted during the quarter, including product categories, quantities sold, and corresponding revenue figures.
- Arrange the Advertising warehouse (shopper)
- The existing dataset will be transformed into a comprehensive advertising and marketing database by consolidating various data sources and performing meticulous cleansing processes. This enhanced repository will provide actionable insights for targeted campaigns, enabling businesses to optimize their marketing strategies and achieve greater returns on investment. With this robust database, marketers can accurately identify and engage with their target audience, streamline campaign execution, and measure the effectiveness of each promotional initiative.
- The ETL (Extract, Transform, Load) process for our company’s advertising and marketing data is a crucial component of our analytics infrastructure.
- What is the cost of a chargeback? It’s often overlooked yet crucial in calculating the overall expense. Chargebacks can occur due to various reasons such as returns, cancellations, or disputed transactions. To determine the exact figure, let’s dive into the formula:
Chargeback Cost = (Total Chargebacks / Gross Sales) x 100
Stipulations
To effectively engage with this publication, consider the following prerequisites:
-
Three Redshift warehouses are provisioned, comprising one fully provisioned cluster and two additional serverless workgroups within the same account and AWS region.
- Entries to individual warehouses.
- A fully formed and ready-to-use IAM user that efficiently ingests data from Amazon S3 into Amazon Redshift.
- To grant access solely for cross-account purposes, ensure the IAM user or function has permission to authorize data shares. Please rephrase to: For the IAM coverage, verify with.
Stay informed about the latest developments!
Here is the rearranged text:
Arrange the first Extract Transform Load (ETL) cluster producer.
We introduce straightforward approaches for setting up the initial ETL producer cluster to securely store your data.
Hook up with the producer
Let’s finalize the details with your new producer and schedule a meeting to discuss the project.
- In the Amazon Redshift console, navigate to and click on the desired option.
Within the Question Editor V2, you’ll be able to view a comprehensive list of all accessible warehouses in the left-hand pane. You could potentially expand on that concept to showcase their data repositories.
- Connect to your primary Extract, Transform, and Load (ETL) data repository using an administrator-level account or credentials.
- Create the production database by executing this SQL command:
CREATE TABLE SHARE (
ID INT AUTO_INCREMENT,
TYPE VARCHAR(255),
OBJECT_NAME VARCHAR(255),
CREATED_DATE TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
LAST_UPDATED DATE DEFAULT NULL,
PRIMARY KEY (ID)
);
CREATE TABLE OBJECT_TYPE (
ID INT AUTO_INCREMENT,
NAME VARCHAR(255),
DESCRIPTION TEXT,
PRIMARY KEY (ID)
);
CREATE TABLE OBJECT (
ID INT AUTO_INCREMENT,
SHARE_ID INT,
OBJECT_TYPE_ID INT,
OBJECT_NAME VARCHAR(255),
CREATED_DATE TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
LAST_UPDATED DATE DEFAULT NULL,
PRIMARY KEY (ID),
FOREIGN KEY (SHARE_ID) REFERENCES SHARE(ID),
FOREIGN KEY (OBJECT_TYPE_ID) REFERENCES OBJECT_TYPE(ID)
);
Create a comprehensive data dictionary to define your database schema, including tables, views, stored procedures, and functions. This will serve as a centralized reference for all database objects, ensuring consistency and reducing errors. Ensure that each object is well-documented with clear descriptions of its purpose, parameters, and return values.
- After you’ve created the prod database, modify your database connection to the.
prod
.
You may need to reload your webpage to access its content.
- Run the next instructions to create the three schemas you propose to share:
- Create the tables within the
ETL
Schema to Share with Gross Sales and Advertising Shopper Warehouses:Gross Sales by Region
• North
• South
• East
• West
• TotalGross Sales by Product Line
• Apparel
• Footwear
• Accessories
• Home Goods
• TotalGross Sales by Customer Segment
• Women’s Wear
• Men’s Wear
• Children’s Wear
• Unisex
• TotalAdvertising Spend by Region
• North
• South
• East
• West
• TotalAdvertising Spend by Medium
• Print
• Digital
• Outdoor
• Television
• Radio
• Social Media
• Total The following are standard DDL statements originating from the database with updated table names.
- Create the tables within the
SALES
Sales Performance Schema: Warehouse Insights for Gross Sales Shopper
- Create the tables within the
MARKETING
Schema to facilitate seamless sharing with the Advertising Shopper Warehouse:{
“id”: {
“type”: “integer”,
“required”: true
},
“name”: {
“type”: “string”,
“required”: true
},
“description”: {
“type”: “string”
},
“price”: {
“type”: “number”
},
“category”: {
“type”: “string”
},
“subcategory”: {
“type”: “string”
},
“images”: {
“type”: [“string”, “array”],
“required”: false
}
}
Create the datashare
Datashares for Gross Sales Enterprise Model:
Data Share 1: Top-Selling Products by Category
• Identify top-selling products across various categories, including electronics, home appliances, clothing, and accessories.
• Analyze product performance based on sales volume, revenue, and growth trends.
Data Share 2: Product-to-Product Comparison
• Compare the sales performance of different products within a category to identify opportunities for substitution or bundling.
• Determine which products are cannibalizing each other’s sales or driving growth through cross-selling.
Data Share 3: Customer Segmentation
• Segment customers based on demographics, purchase behavior, and loyalty to identify high-value groups.
• Analyze customer preferences to inform marketing strategies and product development.
Datashares for Advertising Enterprise Model:
Data Share 1: Ad Performance by Format
• Track the performance of various ad formats, including display ads, video ads, native ads, and sponsored content.
• Analyze ad metrics such as click-through rate (CTR), conversion rate, and cost per acquisition (CPA) to optimize campaigns.
Data Share 2: Audience Insights
• Identify key audience segments based on demographics, interests, behaviors, and purchase history.
• Determine which audiences are most responsive to specific ad formats or targeting strategies.
Data Share 3: Campaign ROI Analysis
• Analyze the return on investment (ROI) for each advertising campaign, considering factors such as ad spend, conversions, and revenue generated.
• Use this data to optimize future campaigns and ensure maximum ROI.
The grant permissions on schemas to the data share command should be rewritten in a more formal and professional tone.
Authorize schema-level access to the data share for designated users or groups.
To append objects with permissions to a datashare, utilize the grant syntax, designating the target datashare for which you desire to confer permissions.
- Enabling data-share shoppers leveraging gross sales and advertising enterprise models to utilize objects added to their existing workflows with seamless integration.
ETL
schema:
- Enabling the Datashare Shopper (Gross Sales Enterprise Unit) to leverage objects added to their personal workspace streamlines collaboration and accelerates decision-making.
SALES
schema:
- Enable the DataShare Shopper (Advertising Enterprise Unit) to leverage objects added to their inventory for seamless integration with existing marketing strategies.
MARKETING
schema:
The dba grants permissions on specified tables to the data share.
You’ll now be able to grant access to tables within a datashare using the `grant` syntax, explicitly defining the permissions and datashare involved.
- Grant
choose
andinsert
scoped privileges on theetl_audit_logs
What are the key takeaways from the desk to the Gross sales and Advertising data shares?
- Grant
all
privileges on all tables within a database are typically granted to roles and users with specific permissions.SALES
What are the key performance indicators (KPIs) for the Gross Sales data share?
- Grant
all
privileges on all database tables within theMARKETING
schema to the Advertising datashare:
You may optionally choose to integrate new objects for automatic sharing. The system will automatically introduce novel items throughout etl
, gross sales
, and advertising and marketing
schemas to the 2 datashares:
The following permissions are granted:
permissions to the Gross sales namespace
permissions to the Advertising namespace
You can grant permissions to the Sales and Marketing namespaces by specifying the relevant namespace IDs. Two methods exist for searching out namespace IDs:
- Discover the Namespace ID from the Namespace Details page within the Redshift Serverless console.
- What are the fundamental principles driving the evolution of galaxies across cosmic time?
choose current_namespace;
on each shoppers
By executing this command, you may potentially allow access to the opposing namespace, replacing the patron namespace with that of your company’s Sales and Marketing warehouse, denoted by its unique identifier.
The complexity of orchestrating data workflows! Here’s a streamlined version:
Arrange for seamless data movement by designing an ETL (Extract, Transform, Load) job within your trusted ETL producer.
Extract data from various sources using APIs, SQL queries, or file imports; Transform the extracted data by performing calculations, aggregations, and data cleansing as needed; Load transformed data into a target system such as a database, data warehouse, or flat files.
- To streamline the workflow, let’s establish a standardized procedure for moving forward. The following sequence of actions will ensure seamless execution:
Automate report generation to reduce manual effort
Notify relevant stakeholders via email of key milestones
Schedule meetings with project team members to discuss progress
Update project management tool with real-time data- The data processing team must efficiently copy information from the designated AWS S3 bucket to the preconfigured stock desk within the Extract-Transform-Load (ETL) workflow. To achieve this, they will utilize a combination of effective tools and techniques that guarantee seamless data transfer.
- Insert an audit report within the
etl_audit_logs
desk within the ETL
- Verify the processed data within the ETL logging dashboard?
What are the key performance indicators for a warehouse shopper?
At this level, you’re capable of organizing your gross sales data in a shopper warehouse by writing it to shared objects within the ETL producer namespace for effective data storage and retrieval.
The following script creates a new database from the current datashare:
“`sql
CREATE DATABASE DatabaseName
AS COPY of datashare;
“`
Note: Please replace “DatabaseName” with your desired name for the new database.
Establish the foundational framework for your database by meticulously outlining the subsequent actions to take.
- The warehouse team is upgrading their systems and processes to better support our growing inventory levels. This involves implementing a new inventory management software that will allow for more accurate tracking of stock levels and streamlined order fulfillment. Additionally, we are reorganizing our storage areas to optimize space usage and improve pick-and-pack efficiency.
- Run the command
present datashares;
To access ETL and gross sales data shares, in addition to the data share producer’s namespace? - Create a database using the data share namespace, as illustrated in this code snippet:
Granting permissions enables you to assign highly specific permission levels to individual users, database customers, or roles. Without this, when permissions are granted on the datashare database, customers and roles receive unrestricted access to all objects within the datashare database.
The datashare database connection was successfully established.
On this occasion, we introduce straightforward techniques for inserting data into the datashare database using use <database_name>
command and utilizing three-part notation: <database_name>.<schem_name>.<table_name>
.
The most effective approach to mastering this technique lies in its repetitive implementation. Run the next command:
Insert data into datashare tables.
To efficiently process the data, the subsequent actions would be:
- Copying data from the AWS Labs public S3 bucket into tables within the producer’s database requires careful consideration of data formats, schema compatibility, and potential security risks. To achieve this, you can leverage AWS services like Amazon S3 Select or Amazon Athena to extract and transform the data before loading it into your database.
gross sales
schema:
- Insert an entry within the
etl_audit_logs
desk within the producer’setl
schema. Let’s format information with a three-part notation.
Arrange the Advertising warehouse (shopper)
Now, arrange your advertising shopper warehouse to write information to shared objects in the ETL producer namespace. The subsequent steps resemble those previously achieved in setting up the gross sales warehouse customer.
The existing data will first need to be transformed into a standardized format before it can be used in our new database. This may involve reformatting, cleaning and categorizing the information to ensure it is easily accessible and usable for analysis.
To achieve this, we will use SQL to create the necessary tables and load the data from the datashare into the new database. The query below outlines how this process can be accomplished:
CREATE TABLE [Table Name] (
[Column 1] INT PRIMARY KEY,
[Column 2] VARCHAR(255),
[Column 3] DATE,
…);
INSERT INTO [Table Name] SELECT * FROM datashare.[Table Name];
Here are the remaining steps to set up your database:
1. Design the structure of your tables by creating a schema, defining relationships between entities, and deciding on data types for each column.
2. Choose a suitable database management system (DBMS) that meets your needs and is compatible with your chosen programming language.
3. Install the DBMS and create a new database within it.
4. Define the tables in the database by specifying the columns, data types, and relationships between them.
5. Populate the tables with sample or real data to test your database and ensure everything is working as expected.
6. Implement proper security measures to control access to the database and protect sensitive information.
7. Continuously monitor and maintain your database to ensure optimal performance, troubleshoot issues, and adapt to changing requirements.
- Within the existing facility, we will transition to the new warehouse.
- Run the command
present datashares;
To view ETL, Advertising, and Marketing data shares, including those within the Data Share Producer’s namespace. - Create a database using the datashare namespace, as illustrated in this snippet of code.
Begin writing to the datashare database
Here’s how you can simply and efficiently insert data into your Datashare database by invoking a pre-saved procedure.
Arrange and run an ETL job within the ETL producer
Extract source data from various systems, transform it into a standardized format, and load it into a target system for further analysis or reporting; ensure data quality by handling errors, duplicates, and missing values; schedule the process to run periodically using a job scheduler like Cron or Windows Task Scheduler.
- Carry out the necessary steps in accordance with established protocols to achieve the desired outcome.
- Copies data from an Amazon S3 bucket to the client and promotion tables within the MARKETING schema of the producer’s namespace.
- Insert an audit report within the
etl_audit_logs
desk within theETL
schema of the producer’s namespace.
- Run the saved process:
At this level, you have successfully ingested the data into the initial Extract, Transform, and Load (ETL) namespace. You possibly can query the tables within which data resides. etl
, gross sales
, and advertising and marketing
Schemas from both the ETL producer’s warehouse and the Gross Sales and Advertising consumer’s warehouses are compared to verify identical data.
Calculate chargeback to enterprise models
As a consequence of enterprises’ specific workload requirements being shifted to dedicated customers, you can now accurately assign pricing based on compute capacity usage. In Amazon Redshift Serverless, compute capability is calculated using Redshift Processing Units (RPUs), with costs accrued based on actual workload execution in RPU-seconds, billed on a per-second basis. A Redshift administrator can leverage the ‘View’ feature to gain insight into the detailed usage patterns of specific shopper workgroups, including the allocation of resources and associated costs for Redshift Serverless.
To retrieve the total fees for RPU hours utilized within a specified timeframe, execute the following query against the Gross Sales and Advertising business model’s relevant customer workgroups:
Clear up
If you’re no longer utilizing an asset, consider divesting it to avoid unnecessary ongoing fees.
- the Redshift provisioned cluster.
- Redshift serverless workgroups and namespaces.
Conclusion
This blog post demonstrates how to isolate specific workloads from enterprise models and write data to the same producer database across multiple customer warehouses. This revised answer has the following advantages:
- What are the benefits of easy value attribution and chargeback to an enterprise?
By simplifying the process of attributing values to marketing initiatives, enterprises can gain a better understanding of which efforts drive the most significant returns.
- Ability to utilize provisioned clusters and serverless workloads of varying scales to write to standardized databases.
- Accounting skills to write down throughout areas?
- Information stays and moves out to all warehouses rapidly because it’s dedicated.
- The process writes work even when the producer warehouse is paused.
You’re welcome to reply to our questions, and discuss how we can further support your team.
Concerning the authors
Raks Khare Is a senior analytics specialist and options architect at Amazon Web Services (AWS), primarily based out of Pennsylvania. He assists clients across diverse sectors in harnessing the power of architect-led information analytics solutions, delivered at scale on the Amazon Web Services (AWS) platform. Outside of work, he enjoys venturing into new culinary journeys and discovering hidden gems for food and drink, while cherishing precious moments with loved ones at home.
Serves as a senior analytics architect specializing in options trading and cloud infrastructure on Amazon Web Services (AWS). She is intensely focused on developing and deploying cloud-based analytics solutions that help businesses resolve complex challenges. She enjoys exploring new destinations and cherishing moments with her loved ones at home.
As a member of the Amazon Redshift Product Administration team. With over 16 years of experience in working with relational databases, applied sciences, and information security. With a profound interest in resolving customer concerns surrounding over-availability and disaster recovery, he tackles complex issues head-on.