Monday, March 31, 2025

Actual-time insights require actual-time processing. Rockset’s cutting-edge technology delivers unparalleled knowledge latency of just 70 milliseconds, making it the perfect solution for real-time data ingestion and analytics at speeds of up to 20 megabytes per second.

As the demand for real-time data processing accelerates, an impressive 80 percent of Fortune 100 companies are leveraging Apache Kafka to seamlessly integrate and utilize their information assets in a timely manner? Streaming information is frequently routed to real-time search and analytics databases that function as a serving layer, enabling seamless integration with applications such as fraud detection in the fintech sector, real-time statistics in esports, personalized experiences in e-commerce, and more. These latency-sensitive circumstances are precarious, as mere milliseconds of information delay can lead to significant income loss and jeopardize the entire enterprise.

So, prospective customers are inquiring about the end-to-end latency they can expect with Rockset, which refers to the time elapsed between data generation and its availability for querying. As of immediate effect, Rockset announces the release of a benchmark that surpasses industry standards in streaming data analytics.

Rockset’s ability to ingest and index data within 70 milliseconds is a significant accomplishment, as numerous large enterprises have struggled to achieve similar performance for their mission-critical applications. With this benchmark, Rockset gives enterprises the confidence to build next-generation applications on real-time streaming data from sources such as Apache Kafka, Confluent Cloud, Amazon Kinesis, and others.

With a slew of recent product advancements, Rockset has successfully achieved real-time data ingestion with millisecond-level latency.

  • Rockset enables streamlined processing by separating streaming data ingestion from compute and storage tasks, optimized for efficiency within the cloud infrastructure? The innovative design eliminates redundant data processing tasks, significantly reducing the CPU’s write-related workload.
  • Rockset is built upon RocksDB, a high-performing, embedded storage engine designed for speed and efficiency. RocksDB 7.8.0+, recently integrated into Rockset, has introduced several upgrades that substantially minimize write amplification.
  • Rockset enables schemaless ingestion, seamlessly integrating with a range of data formats, including JSON, Parquet, and Avro, to efficiently process complex, deeply nested information structures. Rockset leverages custom-built, environmentally sustainable data parsers to convert complex analytics inputs into its proprietary format during ingestion.

This blog post details the testing configuration, results, and performance boosts that enabled Rockset to achieve a remarkable 70 ms data latency at a throughput rate of 20 MB/s.

Elevate Efficiency in Real-Time Search and Analytics through Data-Driven Insights

Two primary characteristics that distinguish real-time search and analytics databases are information latency – the time between data ingestion and query execution – and question latency – the delay between a user’s query submission and receiving the results?

Knowledge latency refers to the timeframe between the generation of information and its availability for querying within a database. In high-stakes, real-time scenarios, every millisecond counts, potentially making all the difference in apprehending fraudsters before they disappear, safeguarding players immersed in adaptive gameplay, and revealing tailored merchandise based on online behavior.

Latency in query processing measures the time taken by a system to execute a query and return its outcome. To minimize question latency and foster engaging experiences, purposes should focus on creating snappy, responsive interactions that keep customers actively involved. Rockset has benchmarked its performance in question latency using an industry-standard benchmark for analytical functions, outpacing ClickHouse and Druid to deliver response times as low as 17 milliseconds.

We evaluated data latency across various ingestion rates on this blog. As the growing reliance on real-time data streams prompts an escalating demand for timely insights, knowledge latency has become an indispensable manufacturing requirement for numerous organizations seeking to capitalize on the power of instant intelligence. Buyers have revealed that numerous data processing methods struggle under the weight of overwhelming throughput, ultimately failing to deliver reliable, high-performing data ingestion for their applications. The lack of dedicated methodologies for real-time data intake and scalability solutions to process growing volumes of event-driven information is a pressing concern, as it hinders our capacity to effectively handle the surge in throughput from emerging event streams.

This benchmark aims to demonstrate the feasibility of building low-latency search and analytical capabilities for real-time data processing.

Measuring Throughput and Latency with RockBench

Utilizing RockBench, we assessed the real-time data ingestion performance of Rockset’s streaming capabilities, evaluating its throughput and end-to-end latency in a rigorous benchmarking process.

RockBench consists of dual components: a knowledge generation module and a metrics assessment tool. Occasions are recorded in the database every second by the information generator, while the metrics evaluator tracks both throughput and end-to-end latency.

RockBench Knowledge Generator

The information generator produces 1.25 kilobyte documentation for each unique instance. This translates to approximately 8,000 writes per second being equivalent to a write speed of 10 megabytes per second.

To accurately model semi-structured scenarios in real-world settings, each document contains 60 fields comprising nested objects and arrays. The document also features various fields that serve as input for calculating the end-to-end latency.

  • _idThe unique document identifier?
  • _event_timeDisplays the current clock time of the generator’s system.
  • generator_identifier: 64-bit random quantity

The _event_time The time lag of each document is subsequently subtracted from the current time of the machine to determine the information latency for each document. This measurement also includes round-trip latency, which is the time taken to execute a query and receive results from the database. Metrics are published to a Prometheus server, where they’re aggregated to provide latency insights across all evaluators, including p50, p95, and p99 percentiles.

The data insertion process utilises an information generator that adds newly created records to the database without updating or replacing existing ones.

Rockset Configuration and Outcomes

When processing streaming data, all databases must balance the competing demands of throughput and latency, where increased throughput inevitably incurs latency costs, and similarly, prioritizing low latency can compromise overall performance.

Recently, we conducted a comprehensive benchmark to gauge the performance of Rockset against Elasticsearch in terms of maximum throughput. To gauge performance in scenarios requiring the latest available data, we reduced information latency and showcased Rockset’s capabilities.

We executed a benchmark utilizing a batch measurement of 10 documents per write request against an initial Rockset dataset size of 300 gigabytes. The benchmark maintained a constant ingestion throughput of 10 MB/s and 20 MB/s, capturing latency metrics for the p50, p95, and p99 quantiles.

The benchmark was executed on XL and 2XL digital containers, utilizing dedicated allotments of computational resources and memory. The XL digital instance boasts 32 vCPUs and 256 GB of memory, while the 2XL configuration features 64 vCPUs and a substantial 512 GB of memory.

Abstract results from the Rockset benchmark at p50, p95, and p99 latency milestones are listed below.

Outcomes Desk
Outcomes Bar Chart

At a data throughput rate of 20 megabytes per second and an information latency requirement, Rockset achieved a processing speed of approximately 70 milliseconds. As throughput escalates and digital event dimensions expand, Rockset is poised to maintain latency on correlated data. The information latency distributions for p95 and p99 averages exhibit a predictable clustering pattern, showcasing efficient performance.

Rockset Efficiency Enhancements

Efficient processing is enabled by several optimisations within Rockset, thereby achieving sub-millisecond data latency.

Compute-Compute Separation

Rockset recently introduced a novel cloud architecture for real-time analytics. The architecture enables clients to rapidly provision multiple remote virtual environments on the same centralized infrastructure. The introduction of a novel architecture enables customers to separate compute resources dedicated to streaming ingestion from those utilized for query processing, thereby ensuring both impressive and predictable performance. With the elimination of compute rivalry, customers no longer need to overprovision or deploy additional replicas to ensure optimal performance.

By adopting this new architecture, we’ve been able to streamline processes by eliminating duplicate tasks during ingestion, ensuring that data parsing, transformation, indexing, and compaction occur only once in the workflow. This significant reduction in CPU overhead needed for ingestion enables organizations to achieve greater value from their data, while maintaining reliability and allowing customers to optimize their price-performance ratio for improved returns.

RocksDB Improve

As its embedded storage engine lies beneath the surface. The team at Rockset developed and open-sourced RocksDB, a technology that is now employed by numerous web-scale companies, including LinkedIn, Netflix, and Pinterest, in addition to Facebook. RocksDB was selected by Rockset due to its efficiency and ability to efficiently manage regularly changing data. Rockset’s latest technology leverages RocksDB’s most advanced model (7.8.0+) to deliver a significant reduction in write amplification, exceeding a 10% improvement.

RocksDB’s initial implementation featured a partial merge compaction strategy, where a single file is selected from the supply pool and compacted into the subsequent stage. Compared to a full merge compaction, this approach yields a smaller compaction footprint while allowing for greater parallel processing capabilities. Despite this, write amplification also occurs.

Earlier RocksDB Merge Compaction Algorithm

In RocksDB models starting from version 7.8.0, the compaction output file is written earlier and allows for larger files than targeted_file_size To synchronize compaction data with subsequent stage details. This reduction in write amplification amounts to a significant 10+ percent decrease.

New RocksDB Merge Compaction Algorithm

With the upgraded RocksDB model, users can expect a significant reduction in write amplification, resulting in improved ingest efficiency that is reflected in benchmark results.

Customized Parsers

Rockset enables schemaless ingestion, seamlessly integrating various data formats, including JSON, Parquet, Avro, XML, and more. Rockset’s native integration with SQL enables seamless processing of semi-structured data, eliminating the need for cumbersome upstream pipelines that introduce unnecessary latency and complexity. At ingest time, Rockset transforms the provided data into a standardized, proprietary format using sophisticated parsing tools.

Knowledge parsers are responsible for retrieving and processing data to render it searchable. Rockset’s legacy information parsers, which utilized open-source components, failed to capitalize on opportunities for memory optimization and compute efficiency. Additionally, the legacy parsers initially converted data into a mid-format before transforming it again to Rockset’s proprietary format for processing. To minimize latency and optimize processing power, the data interpreters were redeveloped in a bespoke structure. Customized information parsers demonstrate a remarkable speed boost, reducing information latency by half and achieving impressive results on this benchmark.

How Efficiency Enhancements Profit Prospects

Rockset enables businesses across industries to build functions on streaming data with predictable and exceptional efficiency ingestions. Examples of latency-sensitive functions built on Rockset include those in the insurance, gaming, healthcare, and financial services industries, such as:

  • The insurance industry’s digitization drive has led insurers to craft customised policies that adapt to individual risk profiles, with real-time adjustments possible. A leading Fortune 500 insurance provider delivers prompt and accurate insurance quotes by analyzing numerous risk factors, necessitating sub-200ms data latency to generate instant, real-time insurance quotes seamlessly.
  • In the gaming industry, real-time leaderboards significantly boost gamer engagement and retention by leveraging dwell metrics. A leading esports gaming company demands precisely 200 milliseconds of information latency to accurately reflect the real-time progression of video games.
  • Financial institutions: Financial management software enables businesses and individuals to monitor their financial health and track the allocation of funds. A leading Fortune 500 company leverages cutting-edge real-time analytics to provide a comprehensive 360-degree view of financials, showcasing the latest transactions in under 500 milliseconds.
  • The healthcare industry requires constant management of wellbeing information and patient profiles to account for the ongoing evolution of test results, medication updates, and patient interactions. A leading healthcare participant collaborates with research teams to remotely monitor and track patients in real-time, necessitating a data latency requirement of under two seconds to ensure timely insights and informed decision-making.

Rockset’s scalable architecture enables seamless ingestion of high-velocity data streams without compromising query performance. As a result, companies across various sectors are harnessing the value of real-time streaming data in a sustainable, user-friendly manner. We’re thrilled to announce a significant milestone in our quest for reduced data latency, having achieved an unprecedented 70 ms data latency when ingesting 20 MB/s of real-time streaming data with Rockset.

By leveraging Rockset’s capabilities, you can effortlessly implement efficiency enhancements without needing to fine-tune your infrastructure or update manual procedures?


Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles