Wednesday, April 2, 2025

Migrate Amazon Redshift from DC-2 to RA-3 to support increasing data volumes and enhance analytics capabilities.

As companies strive to make informed decisions, the exponential growth in the volume of knowledge being created and evaluated poses a significant challenge. This pattern is by no means an exception for an ecommerce firm that recognizes the vital importance of leveraging data to inform and drive strategic decision-making processes. As the volume of available knowledge continues to swell, Dafiti must master the art of effectively harnessing and distilling valuable insights from this vast sea of information to gain a competitive advantage and inform strategic decisions that harmonize with the company’s overarching objectives.

Used extensively by Dafiti to power its information analytics, the platform handles approximately 100,000 daily queries from over 400 customers across three global locations. The following processes integrate data extraction, transformation, loading (ETL), and one-time analytics for streamlined operations. Dafiti’s data foundation relies heavily on the synergy between extract-transform-load (ETL) and extract-load-transform (ELT) processes, which execute approximately 2,500 unique processes daily. The processes gather data from approximately 90 distinct information sources, resulting in updates to around 2,000 tables within the information warehouse and an additional 3,000 tables in Parquet format stored on Amazon S3, accessible via and a knowledge lake.

The increasing demand for scalable storage solutions to manage data from more than 90 sources, combined with enhanced performance capabilities on our newly designed infrastructure, as well as managed storage and security features, drove the need to migrate from DC2 to RA3 nodes.

During this submission, we outline our approach to navigating the migration process and provide further insights into our area of expertise.

Amazon Redshift at Dafiti

Amazon Redshift is a fully managed data warehousing service that was adopted by Dafiti in 2017. Since that time, we’ve had the opportunity to witness numerous innovations and navigate through three distinct node types. With the introduction of Redshift Spectrum and the successful transfer of our legacy data to an information lake, we streamlined our architecture by migrating from 115 DC2.massive nodes to a more efficient setup utilizing four DC2.8xlarge nodes. Enabling flexible scalability by allowing us to allocate computing and storage resources on-demand, according to our specific needs. Here’s how we leveraged our initial setup, comprising eight 3.4xlarge nodes in our production environment and one XLPlus node in our testing cluster to reach this milestone.

In today’s data-rich environment, where multiple information sources and exponentially increasing volumes of new content are constantly emerging, a critical challenge has arisen: our existing 10 TB storage capacity within our cluster is no longer sufficient to meet our demands. While most of our data currently resides within the data lake, additional storage capacity was required in the data warehouse. The efficiency of this solution was significantly enhanced by RA3, as it allows for the separate scaling of both computational resources and data storage. With the elimination of ETL processes, our data pipelines were significantly streamlined, allowing us to ingest vast amounts of knowledge in near real-time from Amazon RDS scenarios, while seamless information sharing enables a knowledge mesh approach.

Migration course of to RA3

To determine the optimal size for our new cluster, we initially turned to AWS’s guidelines, which provide.

Given the existing configuration of our cluster, comprising four DC2.8xLarge nodes, it is recommended that we upgrade to RA3.4xLarge instances.

One concern at this level was whether reducing vCPU quantities and memory would have a significant impact. Despite the increased number of nodes in RA3, the combined resources provided by our 8 nodes only managed a total of 96 vCPUs and 768 GiB, a decrease from the 128 vCPUs and 976 GiB supplied by the 4 nodes in DC2. Notwithstanding this, the efficiency was significantly enhanced, with workloads processed at a rate 40% faster than before.

Before migrating your production environment to Amazon Redshift, AWS suggests validating the configuration chosen for your workload earlier. Given the unique nature of our workload at Dafiti, offering a degree of flexibility in adjusting specific windows without impacting the business, it was not imperative to utilize Redshift Test Drive.

The migration process unfolded thusly:

  1. We established a fresh cluster comprising eight r3.4xlarge instances, leveraging a snapshot of our original four-node dc2.8xlarge infrastructure. The course of action took around 10 minutes to create a brand-new cluster with approximately 8.75 terabytes of data.
  2. During the migration period, we temporarily suspended our ETL and ELT orchestrator to prevent data freshness from being compromised.
  3. We successfully reconfigured the DNS settings to seamlessly direct traffic to our newly deployed cluster, ensuring a seamless experience for our customers. At this stage, solitary one-off inquiries and users who have directly engaged with the newly formed cluster are being received.
  4. Following the successful validation of the learn question, we reinitiated our orchestrator to enable the execution of information transformation queries within the newly configured cluster.
  5. The DC2 cluster was successfully decommissioned, enabling a seamless migration to complete the project.

The following diagram elucidates the intricacies of the migration framework.

Migrate architecture

During the migration process, we identified key milestones that would enable a successful rollback in the event of an unexpected issue arising. The initial checkpoint was established in Step 3, where inserting a discount in personal inquiries would necessitate a roll-back operation to maintain system integrity. If discrepancies arose in Step 4’s checkpoint due to subpar performance of ETL and ELT processes in DC2 relative to established metrics, this necessitated further evaluation. The rollback would simply involve reverting the DNS setting back to point at DC2, allowing for the rebuilding of all processes within the designated maintenance window, thereby ensuring a seamless transition.

Outcomes

The RA3 household introduced a plethora of flexible options, permitting scalable deployment, and empowered us to allocate compute and storage resources separately, thereby revolutionizing the game at Dafiti. Prior to this, our existing cluster performed satisfactorily, yet its limited storage capacity necessitated daily maintenance to manage disk usage and prevent bottlenecks.

The RA3 nodes executed tasks at a significantly higher rate, with workload performance boosted by an impressive 40%. The reduction in supply time within our critical information analytics processes has significant implications.

Following the migration, the enhancement became even more prominent due to Amazon Redshift’s capabilities to optimize caching, refine statistical analysis, and implement performance-enhancing recommendations. Furthermore, Amazon Redshift proactively offers recommendations for optimising our clusters based on the specific requirements of our workloads through its advanced analytics capabilities, and seamlessly integrates with tools that have played a crucial role in achieving a smooth transition.

Significantly, the storage capacity expanded from 10 terabytes to multiple petabytes, effectively resolving Dafiti’s primary challenge of managing exponentially growing data volumes. The significant upgrade in storage capacity, coupled with impressive efficiency gains, conclusively demonstrated the value of migrating to RA3 nodes as a strategically sound decision that effectively addressed Dafiti’s evolving data infrastructure needs.

As companies have leveraged information sharing since the dawn of migration, it has enabled seamless communication between manufacturing and improvement environments; now, a natural progression has emerged – to facilitate an info mesh across Dafiti through this valuable resource. The primary constraint we faced was the requirement to enable case sensitivity, a fundamental condition for successful information exchange, which compelled us to revamp certain ineffective processes. The benefits we’ve witnessed since transitioning to RA3 far surpass any perceived drawbacks from our previous setup.

Conclusion

Here is the rewritten text:

This submission highlights Dafiti’s successful experience in migrating its data warehousing infrastructure to Redshift RA3 nodes, and discusses the benefits that arose from this transition.

Are you curious about the innovative approaches and creative strategies we’re employing within the digital information landscape at Dafiti, driving meaningful engagement and fostering a community of like-minded individuals who share our passion for knowledge sharing and collaboration? Reorganized the input text. Here’s the result:

Sources?


Concerning the Authors

As a seasoned professional in the field of information engineering, I am delighted to serve as the coordinator for Dafiti. With a background in software engineering, he transitioned to information engineering, and now spearheads a high-performing team responsible for the data platform at Dafiti in Latin America.

As an Information Engineering Specialist at Dafiti, I am responsible for ensuring the integrity and continued development of our entire information ecosystem on AWS.

As an Information Engineer at Dafiti, I am responsible for maintaining the information platform and providing access to diverse data sources for internal clients.

As a lead knowledge engineer at Dafiti, it is my responsibility to ensure the long-term sustainability of our company’s information platform, leveraging Amazon Web Services (AWS) solutions to optimize its performance and scalability.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles