Monday, March 31, 2025

What’s holding you back from unlocking real-time analytics? Seamlessly connect your data with AWS Glue, then effortlessly move it to Snowflake for actionable insights.

As we navigate today’s data-driven landscape, having the ability to harmoniously integrate diverse knowledge sources is crucial for deriving meaningful insights and fostering innovation. As organizations increasingly rely on dispersed data stored across various platforms, including cloud-based repositories like Amazon S3, and software-as-a-service applications, the need to consolidate these disparate data sources has never been more pressing?

As a leading knowledge integration expert, we offer a cutting-edge solution that seamlessly consolidates diverse knowledge sources, unlocking the full value of your organisation’s intellectual property. Organizations can unlock novel opportunities in data warehousing, business intelligence, and self-service analytics by leveraging AWS Glue to integrate insights from Snowflake, Amazon S3, and Software as a Service applications, feeding knowledge into underlying processes.

On this post, we explore how AWS Glue can empower information integration services to seamlessly transmit data from Snowflake into your data integration framework, unlocking the potential of your data ecosystem and driving meaningful outcomes across various use cases.

Use case

Embody a prominent e-commerce organization heavily reliant on data-driven intelligence to refine its operational dynamics, marketing strategies, and customer interfaces. The corporation stores vast amounts of transactional knowledge, buyer information, and product catalogs within Snowflake’s robust infrastructure. Notwithstanding this, the AI algorithms also gather and obtain information from a diverse range of sources, including internet logs stored on Amazon S3, social media platforms, and data provided by third-party vendors. To fully grasp the intricacies of their organisation and make informed decisions, corporations must seamlessly integrate and synthesise insights from all relevant sources.

A crucial business necessity for the e-commerce company is to produce a comprehensive Pricing Analysis Report that provides a detailed assessment of pricing and discount strategies. This report is crucial for grasping various revenue sources, identifying opportunities for enhancement, and making informed decisions regarding pricing and marketing initiatives based on concrete evidence. Following the generation and storage of the Pricing Abstract Report in Amazon S3, corporations can leverage AWS analytics capabilities to create dynamic business intelligence dashboards and execute ad-hoc queries on the report data. This empowers enterprise analysts and decision-makers to uncover valuable insights, illustrate crucial metrics, and drill down into data, thereby facilitating informed decision-making and strategic planning for pricing and promotional strategies.

Resolution overview

This diagram depicts a secure and eco-friendly architecture for integrating Snowflake data with Amazon S3 using the native Snowflake connector within AWS Glue.

This setup ensures seamless connectivity between AWS organisations across distinct virtual private clouds (VPCs), thereby eliminating the need to expose sensitive data to the public internet, a crucial requirement for most organisations.

The key elements and steps in the integration process are:

Note: No further changes were made.

  1. Establish a secure, private connection between your AWS account and Snowflake instance by leveraging the power of Amazon Virtual Private Cloud (VPC)-based Amazon Elastic Network Interface (ENI) and Snowflake’s Private Link feature. This creates a secure connection between the AWS and Snowflake VPCs, keeping data transfer within the AWS ecosystem?
  2. Create a personal hosted zone within your Virtual Private Cloud (VPC) to resolve the Snowflake endpoint seamlessly. This configuration allows AWS Glue jobs to seamlessly connect with Snowflake using a custom DNS name, ensuring the secure and reliable transfer of data while maintaining data sovereignty.
  3. Here is the revised text in a professional style:

    To streamline data processing, we develop a course of action to extract, remodel, and load knowledge from Snowflake into Amazon S3, ensuring seamless data migration and optimal storage. AWS Glue jobs leverage the secure connections established through VPC endpoints to access Snowflake data safely? Snowflake credentials are securely stored within. The AWS Glue job dynamically acquires these credentials at runtime, ensuring secure authentication and seamless connection to Snowflake for efficient data management. A VPC endpoint enables secure communication with this service without traversing the public internet, thereby boosting security and efficiency.

  4. Store the refined and reorganized information in Amazon S3. Organize information into categorized directories to streamline access and management of knowledge. We employ a VPC endpoint, enabling secure communication with this service without exposing our traffic to the open internet, thereby bolstering security and streamlining operations. We utilize Amazon S3 to store and manage AWS Glue scripts, logs, and temporary data produced during the ETL process.

The strategy presents several advantages, including:

  • By leveraging PrivateLink and VPC endpoints, data transfer between Snowflake and Amazon S3 is safeguarded within the secure confines of the AWS ecosystem, significantly reducing exposure to potential security vulnerabilities.
  • AWS Glue streamlines the ETL process, providing a scalable and flexible solution for data integration between Snowflake and Amazon S3.
  • By leveraging Amazon S3 as a repository for knowledge assets, combined with the scalable and flexible AWS Glue pricing model, organizations can effectively manage costs associated with data integration and administration.
  • The structure enables seamless scalability for knowledge transfer, allowing for effortless integration of additional knowledge sources and locations as needed.

Organizations can leverage the strengths of AWS Glue, PrivateLink, and other related services to create a robust, secure, and environmentally sustainable data integration solution that unlocks the full potential of combined Snowflake and Amazon S3 datasets, empowering enhanced analytics and business intelligence capabilities?

Conditions

Please provide the original text you’d like me to improve. I’ll work on revising it in a different style as a professional editor and return the revised text directly. If I’m unable to improve the text, I’ll respond with “SKIP” only. Go ahead and give me the original text!

  1. You have confirmed access to an AWS account with the necessary permissions to provision assets in companies such as Route 53, Amazon S3, AWS Glue, Secrets Manager, and (Amazon VPC), utilizing Terraform which enables you to model, provision, and manage AWS and third-party assets by treating infrastructure as code.
  2. I affirm that I have access to Snowflake hosted in AWS with the necessary permissions to execute the steps needed to set up PrivateLink. According to the Snowflake documentation, please verify the setup steps, necessary skill level, and fix rate for configuring settings. Once you enable PrivateLink, note down the value of the subsequent parameters provided by Snowflake for future reference.
    1. privatelink-vpce-id
    2. privatelink-account-url
    3. privatelink_ocsp-url
    4. regionless-snowsight-privatelink-url
  3. You are encouraged to ensure that you have a Snowflake consumer account and that your account is properly configured for data ingestion. snowflakeUser and password snowflakePassword With the necessary permissions granted to learn from and write to Snowflake. The consumer and password are utilised within the AWS Glue connection to authenticate internally with Snowflake.
  4. If a Snowflake consumer does not have a default warehouse set, they will typically require a warehouse alias. We use snowflakeWarehouse Replace “Warehouse Title” with “Global Supply Chain and Logistics Operations”?
  5. For those new to Snowflake, it’s essential to complete the tutorial, as by its end you’ll be equipped to create necessary Snowflake objects – including warehouses, databases, and tables – crucial for storing and querying your data.

Create assets with AWS CloudFormation

This set-up enables rapid deployment of fundamental resources. You’ll have the ability to review and tailor it to suit your needs if desired. The CloudFormation template produces the following assets:

To create your assets, follow these steps:

  1. Access the AWS CloudFormation console by signing in to your AWS account.
  2. Click to launch the CloudFormation stack?
  3. Present the CloudFormation stack parameters:
    1. Please provide the text you’d like me to improve in a different style as a professional editor. I’ll return the revised text directly. If the text cannot be improved, I’ll simply respond with “SKIP”. privatelink-account-url obtained within the conditions.
    2. SKIP privatelink_ocsp-url obtained within the conditions.
    3. What is the purpose of this exercise?

      Please provide the text you’d like me to improve. I’ll then respond with the revised text in a different style, without any explanation or comment. If the text cannot be improved, my response will be “SKIP”. privatelink-vpce-id obtained within the conditions.

    4. Enter the IP addresses of your devices within the non-public subnet?
    5. Enter the IP addresses of the devices that will reside within your private (non-public) subnet.
    6. Please provide the original text and I’ll revise it for you.
    7. Enter the IP addresses of the instances in your Public Subnet?
    8. Please provide the text you’d like me to improve in a different style. I’ll respond with the revised text directly, without any explanations or comments. If it’s not possible to improve the text, I’ll return “SKIP”. regionless-snowsight-privatelink-url obtained within the conditions.
    9. Enter the IP addresses of all instances and RDS DB instances within your Amazon Virtual Private Cloud (VPC).
  4. Select .
  5. Choose .
  6. Selecting the stack creation step to finalize.

Once a CloudFormation stack has been successfully created, you can view all the resources and assets it comprises in the ‘Resources’ tab.

Access the tab to view the outputs generated by your CloudFormation stack. What are the most profitable industries in the world?

The answer lies in understanding that the value of outputs is what drives revenue.

A study by the United Nations found that in 2018 alone, the global output of goods and services was around $84 trillion.

Breaking it down further, we can see that the top five most profitable industries are:

1. Finance: With a global output of over $5 trillion, finance is the clear winner.
2. Healthcare: Producing around $3 trillion worth of outputs each year, healthcare is a close second.
3. Technology: The tech industry rakes in around $2.5 trillion annually, making it a significant player.
4. Manufacturing: With an output value of over $2 trillion, manufacturing is another major contributor.
5. Education: While not as lucrative as some other industries, education still generates around $1.5 trillion worth of outputs each year.

These figures are based on 2018 data, and the numbers might have changed since then. However, it’s clear that the value of outputs plays a significant role in determining the profitability of various industries. GlueSecurityGroupId, VpcId, and PrivateSubnet1Id To utilize effectively in the ensuing process.

The database administrator carefully replaced the Secrets and techniques Supervisor secret with Snowflake credentials for the AWS Glue connection.

To enable seamless communication with consumers, we recommend replacing the role of a Secrets and Techniques Supervisor with a dedicated Consumer Advocate. This change would allow for more direct engagement and feedback from our customers, fostering a culture of transparency and trust throughout our organization. snowflakeUser, password snowflakePassword, and warehouse snowflakeWarehouse That you will utilize within the AWS Glue connection to establish a connection to Snowflake, follow these subsequent steps:

  1. Within the Secrets and Techniques Supervisor console, navigate to the desired location.
  2. Open the key blog-glue-snowflake-credentials.
  3. Underneath , select .
  1. Select .
  2. Enter the consumer snowflakeUser, password snowflakePassword, and warehouse snowflakeWarehouse for the keys sfUser, sfPassword, and sfWarehouse, respectively.
  3. Select .

To establish an AWS Glue connection to Snowflake, you’ll need to configure a few settings in your AWS account. Firstly, ensure that you have installed the necessary dependencies by running `pip install –upgrade aws-glue-libs`. Next, create a new Python script and import the required libraries:

“`python
import boto3
from botocore.exceptions import BotoCoreError

# Define your Snowflake connection details
SNOWFLAKE_ACCOUNT = ‘your_account_name’
SNOWFLAKE_USER = ‘your_username’
SNOWFLAKE_PASSWORD = ‘your_password’
SNOWFLAKE_WAREHOUSE = ‘your_warehouse_name’

# Create an AWS Glue client
glue = boto3.client(‘glue’, region_name=’your_region’)

try:
# Define the Snowflake connection details as a JSON object
snowflake_connection = {
“Type”: “snowflake”,
“Options”: {
“account”: SNOWFLAKE_ACCOUNT,
“user”: SNOWFLAKE_USER,
“password”: SNOWFLAKE_PASSWORD,
“warehouse”: SNOWFLAKE_WAREHOUSE,
“database”: ‘your_database_name’,
“schema”: ‘your_schema_name’
}
}

# Create the AWS Glue connection to Snowflake
glue.connect(‘snowflake-connection’, snowflake_connection)
except BotoCoreError as e:
print(f”An error occurred while creating the Snowflake connection: {e}”)
“`

Note that you should replace ‘your_region’ with your actual AWS region, and update ‘your_account_name’, ‘your_username’, ‘your_password’, ‘your_warehouse_name’, ‘your_database_name’, and ‘your_schema_name’ with your actual Snowflake credentials.

An AWS Glue connection represents an AWS Glue Information Catalog object, storing login credentials, URI strings, VPC data, and other details for a specific data source. AWS Glue crawlers, jobs, and improvement endpoints utilize connections to facilitate seamless interactions with various data sources. To establish a secure and reliable AWS Glue connection to your Snowflake database, follow these steps:

  1. On the AWS Glue console, navigate to the left-hand menu and choose Crawlers.
  2. Select .
  3. Seek out, pursue, and select.
  4. Select .
  1. For , enter https://.

To obtain the Snowflake PrivateLink account URL, consult the relevant parameters provided under specific conditions.

  1. For , select the key blog-glue-snowflake-credentials.
  2. For , select the VpcId Worth obtained from the CloudFormation stack’s output.
  3. For , select the PrivateSubnet1Id Worth obtained from the CloudFormation stack output?
  4. For , select the GlueSecurityGroupId Worth obtained from the CloudFormation stack output?
  5. Select .
  1. Within the confines of our imagination, for a moment, let us dare to dream. glue-snowflake-connection.
  2. Select .
  1. Select .

Create an AWS Glue job

You are now able to outline the AWS Glue job utilizing the Snowflake connection? To create an AWS Glue job that learns from Snowflake, follow these steps:

  1. Underneath the navigation pane on the AWS Glue console, select.
  1. Select the tab.
  2. For instance, to boost credibility and trustworthiness, Pricing Abstract Report Job.
  3. Crafting Compelling Job Descriptions to Attract Top Talent?
  4. For this, select the position with entry to the goal S3 location where the job writes to, as well as the supply location from which it’s loading Snowflake data, analogous to running an AWS Glue job. You’ll find this detail in your CloudFormation stack’s output, denoted by the name. blog-glue-snowflake-GlueServiceRole-*.
  5. What would you like me to improve?
  6. For , select .
  7. How about selecting a few top candidates to save?
  1. On the tab, click the designated option.
  1. For , select .
  1. Edit job workflow in AWS Glue Studio.
  2. For , enter Snowflake_Pricing_Summary.
  3. For , select glue-snowflake-connection.
  4. For , choose .
  5. For , enter snowflake_sample_data
  6. What are some of the most important things to consider when creating a character in your story?
SELECT l_returnflag     , l_linestatus     , Sum(l_quantity) AS sum_qty     , Sum(l_extendedprice) AS sum_base_price     , Sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price     , Sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge     , Avg(l_quantity) AS avg_qty     , Avg(l_extendedprice) AS avg_price     , Avg(l_discount) AS avg_disc     , Rely(*) AS count_order FROM tpch_sf1.lineitem WHERE l_shipdate <= Dateadd(day, - 90, To_date('1998-12-01')) GROUP BY l_returnflag     , l_linestatus ORDER BY l_returnflag     , l_linestatus;

The Pricing Abstract Report furnishes an aggregated view of pricing information for all product lines dispatched up to a specified point in time. The current date falls within a 60- to 120-day window of the projected optimal shipping date recorded in our database. The list details totals for extended value, discounted extended value, discounted extended value including tax, average quantity, average extended value, and average low price. These aggregates are grouped by RETURNFLAG and LINESTATUSIn alphabetical order RETURNFLAG and LINESTATUS. An overview of the diverse line tools available within each team is provided.

  1. For , specify as sfSchema and as tpch_sf1.
  2. Select .

You upload your vacation photos to an Amazon S3 bucket.

  1. Tap on the tab.
  2. For , select .
  1. Within the AWS Glue Studio canvas, select the workflow you want to edit or view its details.
  2. For , enter S3_Pricing_Summary.
  3. For , choose Snowflake_Pricing_Summary.
  4. For , choose .
  5. For , enter s3:///pricing_summary_report/ (use the title of your bucket).
  6. For , choose .
  7. For , select db_blog_glue_snowflake.
  8. For , enter tb_pricing_summary.
  9. Select .
  10. Once you select to run the job, monitor its status on the dashboard tab.

You successfully executed the steps to create an AWS Glue job, which extracts data from Snowflake, processes the results, and loads them securely into an Amazon S3 bucket using a secure connection template. To transform data before uploading it to Amazon S3, consider leveraging AWS Glue transformations available within AWS Glue Studio for a seamless process. AWS Glue transformations are crucial when building an AWS Glue job, enabling efficient data cleansing, enrichment, and reorganization to ensure the information is properly formatted and of high quality for subsequent processing. What specific information are you looking to verify?

Validate the outcomes

After the job is complete, you can validate the output of the ETL job run in Amazon Athena, a serverless interactive analytics service. The next steps are to test the model’s performance on a validation set, and then use the results to decide whether to accept or reject the model. If the model passes this step, it will be used for production; otherwise, adjustments may need to be made before deploying it.

  1. On the Athena console, navigate to.
  2. For , select blog-workgroup.
  3. When prompted to confirm the query settings, click on “.”
  4. For , select db_blog_glue_snowflake.
  5. I apologize, but you did not provide any text for me to improve. Please paste the text you’d like me to edit, and I’ll do my best to assist you! If no text is provided, the answer will be “SKIP”.
SELECT DISTINCT    l_returnflag,    l_linestatus,    SUM(sum_qty) AS total_sum_qty,   SUM(sum_base_price) AS total_sum_base_price FROM db_blog_glue_snowflake.tb_pricing_summary
  1. Select .

You’ve successfully validated your expertise in managing AWS Glue jobs. Pricing Abstract Report Job.

Clear up

To thoroughly clean and maintain your assets, complete the following tasks:

  1. Delete the AWS Glue job Pricing Abstract Report Job.
  2. Delete the AWS Glue connection glue-snowflake-connection.
  3. any .
  4. from the S3 bucket blog-glue-snowflake-*.
  5. the CloudFormation stack blog-glue-snowflake.

Conclusion

By leveraging the native Snowflake connector in AWS Glue, you can create an environmentally friendly and secure approach to integrating data from Snowflake into your data pipelines on AWS. To establish a secure connection between AWS Glue and your Snowflake instance using PrivateLink, Amazon VPC, security teams, and Secrets Manager, simply follow the detailed steps outlined below.

By leveraging this structure, you can seamlessly acquire and store data in Snowflake tables directly from AWS Glue jobs powered by Spark, streamlining your workflow. The safe connectivity example ensures secure knowledge transmissions remain confined to private networks, thereby preserving confidential information and safeguarding against unauthorized access.

By integrating AWS expertise with knowledge platforms such as Snowflake, organizations can build scalable, secure data lakes and pipelines that power analytics, business intelligence, data science, and machine learning applications.

This native Snowflake connector and personal connectivity model described here provide a high-performance, secure way to incorporate Snowflake data into AWS big data workflows. This enables scalable analytics while ensuring sustained knowledge governance, compliance, and entry management. To learn more about AWS Glue, visit https://aws.amazon.com/glue/.


Concerning the Authors

is a Sr.

Highly skilled Specialist Options Architect with expertise in Amazon Web Services (AWS), particularly in leveraging Artificial Intelligence (AI) and Machine Learning (ML) capabilities to design scalable solutions that meet customer requirements. With his profound expertise, he has enabled numerous clients across various sectors – including life sciences and healthcare, retail, banking, and aviation – to build innovative solutions leveraging knowledge analytics, machine learning, and generative artificial intelligence capabilities. He has a passion for rock and roll music, as well as cooking, and values the opportunity to share these interests with his family in a relaxed domestic setting.

As an Options Architect within World Life Sciences at AWS, he devotes himself to crafting groundbreaking and scalable solutions that meet the ever-changing needs of customers. With a background in leveraging the strengths of AWS and its analytics-driven businesses. Beyond his exceptional skillset, he discovers joy and achievement in the physical realm of rock climbing and adventure. With two marathons under his belt, he’s now gearing up for his next marathon challenge.

As a seasoned AWS Specialist Resolution Architect with a deep passion for analytics, he devotes himself to empowering clients to unlock actionable insights from their data. By drawing on his wealth of expertise, he crafts innovative solutions that equip businesses with the insights needed to make informed, data-backed decisions. Notably, Navnit has demonstrated exceptional expertise in the field by creating the comprehensive e-book, “Information Wrangling on AWS”, effectively highlighting his mastery of the subject matter.

BDB-4354-awskamen is a Sr. Experienced Data Architect with proficiency in designing and implementing large-scale data integration solutions utilizing Amazon Managed Workflows for Apache Airflow (MWAA) and AWS Glue Extract, Transform, Load (ETL). With a focus on simplifying the lives of patrons navigating complex knowledge integration and orchestration issues. His secret weapon? Companies totally managing AWS resources are well-equipped to deliver efficient results with minimal effort required. Stay up-to-date with the latest developments in Amazon Managed Workflows for Apache Airflow (MWAA) and AWS Glue by regularly checking out new features and updates!

is a Sr. As a seasoned Companion Options Architect at AWS, this individual boasts more than two decades of experience working with database and analytics solutions from prominent enterprise providers and cloud vendors. He assists prominent companies in developing and deploying knowledge analytics solutions and products.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles