Thursday, December 5, 2024

Intel and Rockset Collaborate to Optimize Star Schema Benchmark, Delivering 284% Performance Gain on Intel Ice Lake Platforms.

Introduction

We relentlessly optimize the performance of Rockset, meticulously exploring an array of hardware options to identify the optimal combination that balances cost and speed, ensuring seamless support for high-speed data ingest and real-time query processing.

Owing to ongoing efforts to boost efficiency, our organization introduced a new software application that takes advantage of the latest Third Generation Intel Xeon Scalable Processors, code-named Ice Lake. As a result of the transition to new infrastructure, Rockset queries have demonstrated a remarkable 84% improvement in query speed on the industry-recognized Star Schema Benchmark (SSB), a standard metric for evaluating query performance in data-intensive applications.

As software harnesses the power of Intel Ice Lake’s architecture, it accelerates performance on the system-to-system bus. Meanwhile, various optimizations have been implemented to capitalize on common query patterns within information services, resulting in tangible gains.

  • Rockset materialises Common Table Expressions (CTEs) to significantly reduce overall query processing time.
  • Statistics-driven Predicate Pushdown: Leveraging assortment statistics, Rockset’s predicate pushdown methodology optimizes query performance by up to 10x.
  • To improve performance, Row-Retailer Cache, a novel application of Multiversion Concurrency Management (MVCC), has been introduced in row retailers. By minimizing the overhead of metadata operations, this innovation significantly reduces query latency whenever the working set fits within memory.

On our weblog, we delve into the configuration, outcomes, and efficiency improvements of SSB.

Configuration & Outcomes

The Standardized Sentiment Benchmark (SSB) is a well-established standard primarily based on capturing frequent question patterns for information retrieval functions.

To assess the impact of Intel’s Ice Lake processor on real-time analytics workloads, we conducted a before-and-after comparison using the Streaming Storables Benchmark (SSB). Rockset employed its XLarge Digital Occurrence (VI) architecture featuring 32 vCPUs and 256 GiB of RAM.

The Systemic Support Brief (SSB) comprises 13 in-depth analytical inquiries.

Notably, our entire query suite achieved a remarkable 84% speedup when compared to its previous runtime, completing in just 733 milliseconds on Rockset utilizing Intel Ice Lake processors. This significant performance improvement was attained by leveraging the enhanced processing capabilities of the latest hardware architecture. According to benchmarking results, Rockset demonstrated a significant performance advantage when utilizing Intel Ice Lake processors, outpacing competitors across all 13 Standard System Benchmark (SSB) queries, with a particularly notable 95% improvement in one query’s execution time.

Utilizing the columnar index, we executed each query 1,000 times on a warmed operating system cache, recording the average runtime. No type of question outcomes caching was employed for the analysis. Instances are reported by Rockset’s API server.

Rockset Efficiency Enhancements

We highlight several efficiency improvements that provide increased support for diverse query patterns found in information services.

Materialized Widespread Desk Expressions (CTEs)

By leveraging CTEs, Rockset effectively reduces overall query processing time.

Common techniques for solving complex queries include using common table expressions (CTEs) or subqueries. Repetitive instances of the identical CTE can significantly prolong total execution time by requiring repeated runs, ultimately leading to increased processing overhead. The Customer Transaction Entity (CTE) references itself twice.

WITH max_category_price AS 
  (SELECT class, MAX(worth) AS max_price FROM merchandise GROUP BY class) 
  SELECT c1.class, SUM(c1.quantity), MAX(c2.max_price) 
    FROM ussales c1 JOIN max_category_price c2 ON c1.class = c2.class 
   GROUP BY c1.class 
UNION ALL
SELECT c1.class, SUM(c1.quantity), MAX(c2.max_price) 
  FROM eusales c1 JOIN max_category_price c2 ON c1.class = c2.class 
   GROUP BY c1.class

With materialized CTEs, Rockset optimizes query performance by executing the CTE only once and caching its results, thereby reducing resource utilization and latency.

Stats-Primarily based Predicate Pushdown

By leveraging a range of statistical metrics, Rockset’s predicate push-down method is able to achieve query speeds that are up to 10 times faster.

A predicate is typically defined as an expression whose truthfulness or falsity determines whether a specific record in a database satisfies certain conditions, commonly found in the WHERE or HAVING clauses of SQL queries. By leveraging predicates to selectively retrieve relevant data from storage, a predicate pushdown facilitates more efficient query execution and reduces the amount of information being processed at higher levels.

Rockset enables efficient data organization by utilizing a versatile architecture that incorporates a search index, a column-based index, and a row-store database, allowing for fast and environmentally sustainable data retrieval.

Rockset leverages its search indexes to efficiently retrieve documents that meet specific criteria, retrieving relevant data from the underlying row store once the matches are identified.

Predicates in a question may encompass both broadly selective and narrowly selective types. With overly broad predicate filtering, Rockset’s excessive reliance on indexing slows query execution time significantly. To circumvent this limitation, Rockset introduced a novel approach: stats-based predicate pushdowns, which determine whether a predicate is predominantly broad or narrow in scope, solely based on aggregate statistics from the underlying dataset. Narrowly focused optimizations enable query acceleration of up to 10 times, fostering rapid performance improvements.

Beneath the surface of ostensibly disparate concepts lies the subtle interplay between categorical distinctions.

SELECT first_name AS Title, final_year_title, age 
FROM college_students 
WHERE final_year_title = 'Borthakur' AND age = 10;

The rarity of the title “Borthakur” renders it an infrequently occurring yet precisely targeted criterion, whereas the age 10 exhibits a more common and inclusive nature, serving as a widely applicable parameter. The stats-based predicate pushdown optimizes the query by solely pushing down the filter WHERE final title = ‘Borthakur’, thereby accelerating execution time.

Row-Retailer Cache

To optimize the performance of row retailers, we developed a caching solution that reduces the burden of metadata operations, subsequently minimizing latency whenever the working set fits within RAM.

What are the key considerations for a company’s strategic planning process?

SELECT title 
FROM college students 
WHERE age = 10

When predicate selectivity is low, we leverage the search index to quickly retrieve relevant document identifiers, such as those matching a specific condition (e.g., WHERE age = 10). Subsequently, we can efficiently retrieve the corresponding document values and their associated column names (e.g., title).

RockSet leverages RockDB as its embedded storage engine, utilizing key-value pairs to store documents efficiently, with each document represented by a unique identifier and corresponding value. RocksDB provides a built-in in-memory cache, known as the block cache, which efficiently stores frequently accessed data blocks in memory. A block may occasionally consist of multiple documents. RocksDB utilizes a metadata look-up operation that combines an internal indexing strategy and bloom filters to efficiently locate blocks and pinpoint specific documents within those blocks based on their document values.

The metadata lookup operation consumes a significant portion of the working set memory, thereby increasing query latency. In high-throughput environments, the metadata lookup operation incurs additional memory usage due to its repeated invocation within each individual query’s execution process.

By designing a companionial MVCC cache that directly maps document identifiers to document values for the row retailer, we eliminated the need for block-based caching and metadata operations. This approach enhances workload performance by efficiently utilizing memory resources that accommodate the working set within available RAM.

The Cloud Efficiency Differential

We continually invest in enhancing Rockset’s performance, thereby making real-time analytics more affordable and widely accessible. Recent software deployments leveraging third-generation Intel Xeon Scalable processors have empowered Rockset to achieve a significant performance boost: an impressive 84% speedup compared to previous iterations on the Star Schema Benchmark.

Rapidly delivering enhanced efficiencies to customers through its cloud-native architecture, Rockset ensures seamless access without the need for complex infrastructure adjustments or manual updates. Experience the significant improvements to our information software by joining us this month and unlocking its enhanced capabilities.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles