Rockset is a cloud-based, real-time search and analytics database that enables users to query large volumes of data in seconds. Rockset empowers our customers to achieve optimal price-performance ratios by dynamically adjusting compute and storage resources according to their unique needs. This approach enhances efficiency and flexibility, but its implementation in a real-time system poses significant challenges. Real-time programs have been developed to leverage instantaneously connected storage, enabling rapid data entry in response to dynamic updates. Here is the rewritten text:
This blog post explores how Rockset achieves compute-storage separation while providing real-time access to data for querying purposes.
Attaining Compute-Storage Separation without Efficiency Deterioration?
Traditionally, database design has focused on working seamlessly with applications that feature immediate access to storage resources. The revised text is: This streamlined system enables effortless access to information, offering substantial bandwidth and swift data entry at critical moments. Modern solid-state drives (SSDs) boast impressive capabilities in handling numerous small random read operations, thereby facilitating seamless index performance. This architecture is particularly well-suited for on-premise infrastructure, where resources are already allocated and workloads are naturally constrained by available capacity. In a cloud-first environment, the capabilities and infrastructure must flexibly respond to changing workloads.
When leveraging a tightly coupled architecture for real-time search and analytics, numerous complexities arise.
- Inefficient resource utilisation occurs when scaling compute and storage assets separately, leading to overprovisioning of assets that cannot be scaled independently.
- Scaling gradually: To ensure seamless delivery of services, businesses should factor in the time required to spin up additional resources and bring them online, necessitating careful planning for peak demand.
- Replicating identical datasets across multiple compute clusters necessitates duplicating the data to ensure each node has access to the same underlying information, thereby facilitating efficient computation.
What if we could centralize all data in a single, universally accessible repository, thereby achieving SSD-like performance? This approach would seemingly address all aforementioned concerns.
By decoupling scalability from data requirements, our compute clusters can adapt to changing workloads more efficiently, without necessitating the retrieval of entire datasets with each scaling event. With advancements in technology, we can now handle larger datasets by simply upgrading our storage capacity. This allows multiple compute nodes to access the same dataset without increasing the number of underlying data copies. A significant advantage lies in the ability to provision cloud infrastructure tailored to optimize compute or storage performance specifically.
What’s the Core of Rockset’s Cloud-Native Architecture?
Rockset’s cloud-native structure is built from the ground up to take advantage of the scalability and flexibility offered by cloud computing. At its core lies a microservices architecture, which enables seamless integration with other services and allows for easy scaling of individual components. This approach also fosters a culture of innovation, as each service can be developed, tested, and deployed independently.
Rockset separates compute from storage. Virtual instances (VIS) are assignments of compute and memory resources responsible for processing and analyzing large datasets through information ingestion, transformation, and querying. Rockset boasts an impressive storage layer comprising numerous nodes connected to high-performance SSDs, fostering exceptional operational speed and efficiency.
Below the surface, its embedded storage engine is designed with mutability in mind. Rockset built upon RocksDB to leverage the advantages of cloud-native architectures. RocksDB-Cloud ensures data resilience in the event of machine failures by seamlessly integrating with cloud providers such as Amazon S3. Here is the rewritten text:
Rockset’s architecture enables a tiered storage structure, where hot data is efficiently stored with one copy on SSDs, while replicas are maintained in S3 for added durability. This innovative, tiered storage architecture significantly enhances the value proposition for Rockset customers by providing unparalleled price-performance.
Following our innovative storage layer design, we internalized the subsequent guidelines.
- Efficiently comparable performance to tightly integrated compute-storage architectures?
- No noticeable efficiency degradation occurs throughout deployments or when scaling up or down.
- Fault tolerance
We leverage RocksDB to power our column-store indexing mechanism at Rockset. This choice allows us to store and query large amounts of data efficiently.
By utilizing RocksDB as an underlying storage engine, we can take advantage of its features like efficient write amplification reduction, which enables faster data ingestion and retrieval. Additionally, RocksDB’s support for in-place updates and snapshots helps ensure data consistency and fault tolerance within our system.
In particular, we use RocksDB to store our column-store indexes, which enables fast querying and aggregation on large datasets.
RocksDB is a popular Log-Structured Merge Tree (LSM) storage engine specifically optimized for handling high volumes of writes. New write operations are initially logged to an in-memory memtable within the LSM tree structure, where data is temporarily stored until it reaches a predetermined capacity threshold. Once this limit is reached, the memtable is automatically flushed and its contents are converted into immutable Sorted Strings Table (SST) records for permanent storage. To mitigate the latency associated with replacing data in a RocksDB database, Rockset leverages fine-grained replication of the memtable, decoupling the real-time update process from the slower SST file creation and distribution workflow?
SST record data are compressed into uniform storage blocks to optimize storage efficiency. Whenever an update occurs, RocksDB erases the existing SST file and generates a fresh one reflecting the latest data. This periodic compaction process, akin to efficient garbage collection in programming languages, systematically eliminates outdated data versions and prevents database sprawl, thereby maintaining optimal performance.
Newly processed SST (Synthetic Source Table) records are securely uploaded to Amazon Simple Storage Service (S3), ensuring the data’s integrity and durability. The newly implemented storage layer efficiently retrieves records data from Amazon S3. Given the immutability of records data, the introduction of a new storage layer is simplified, requiring only the identification and storage of newly generated SST files while discarding previously stored ones.
RocksDB retrieves data in fixed-size chunks, denoted by their offset and size within a file, utilizing the underlying storage layer’s capabilities when processing query requests. RocksDB also caches recently accessed blocks within the compute node to facilitate rapid retrieval.
RocksDB stores both data and metadata files, including manifest files, which maintain a record of the current database state and track the version number of the database. These metadata records are a fixed quantity per database instance, being small in size. Metadata recordsdata being mutable and up-to-date as new SST files are created, but infrequently read and never accessed during query execution.
Unlike SST files, metadata files are stored locally on compute nodes and in S3 for durability, but not on the new storage layer? As metadata records data is typically small and read-only from S3, its storage on compute nodes has a negligible impact on both scalability and efficiency. Therefore, the storage layer is significantly streamlined since it only needs to handle immutable data records.
Data optimally organized within scorching storage layers are strategically situated to facilitate expedient retrieval and processing.
At an exceptionally high level, Rockset’s blazingly fast storage layer is an Amazon S3 cache. Once information is written to Amazon’s Simple Storage Service (S3), it is considered durable and can be retrieved on-demand by downloading it from S3 into a new storage layer. Unlike a traditional cache, Rockset’s cutting-edge storage layer leverages a diverse array of techniques to achieve an extraordinary cache hit rate of nearly 100%.
RocksDB, a high-performance open-source database, seamlessly integrates with the blazing-fast storage layer to empower data-driven applications. By harnessing the power of both worlds, developers can now construct robust and efficient data storage architectures that cater to the most demanding use cases.
Each Rockset collection, or a table within the relational domain, is partitioned into distinct slices, each comprising a subset of SST files. The slice comprises all blocks affiliated with these SST record data. The new storage layer enables granular control over data placement by selecting slices as the unit of information organization.
Used to map slices to their corresponding storage nodes, this mechanism is typically employed between a primary and secondary proprietor storage node. The hash can be leveraged by compute nodes to identify and connect with storage nodes, enabling the retrieval of necessary information. The Rendezvous hashing algorithm operates by employing a unique combination of cryptographic techniques and mathematical formulas to efficiently distribute data across a network.
- Each assortment slice and storage node is assigned a unique identifier. These IDs remain static and do not undergo any changes.
- Each storage node hashes the concatenation of its own ID with the slice ID to produce a unique identifier.
- The ensuing hashes are sorted
- The top two storage nodes in the ordered Rendezvous Hashing list are the primary and secondary storage owners of the slice.
Rendezvous hashing is a distributed hash table (DHT) implementation used in Rockset to enable efficient data retrieval and manipulation. The algorithm operates by first generating a unique identifier, or key, for each piece of data. This key is then hashed to produce an output that corresponds to a specific node in the DHT.
Rendezvous hashing was selected for information dissemination due to its unique combination of intriguing characteristics, including the ability to.
- When the configuration of storage nodes changes, it typically produces limited effects. If a node is added to or removed from the new storage layer, the ownership of assortment slice sets may shift while rebalancing will occur proportionally to 1/N, where N represents the number of nodes in the hot storage layer. This rapid scaling enables the new storage layer to quickly adapt to growing demands.
- It accelerates the recovery of the new storage layer after a node failure by distributing the restoration duties across all remaining nodes, allowing them to work in parallel.
- When adding a fresh storage node and forcing the owner of a slice to adapt, calculating which node was previously in charge is straightforward. The ordered Rendezvous hashing list will simply shift by one component. That method enables newly provisioned nodes to retrieve blocks from a previous owner while the fresh storage node is brought online.
- Each component within the system autonomously determines a file’s location without direct interaction. Minimal metadata required for data retrieval solely includes the slice ID and the IDs of available storage nodes. While that is particularly useful when generating fresh data sets, a centralised placement strategy could actually exacerbate latency issues and compromise system reliability.
While storage nodes operate at the slice and SST file level, consistently fetching all SST records for their assigned slices, compute nodes selectively retrieve only the required blocks per query. As a result, storage nodes exclusively seek limited information about the physical layout of the database, enabling them to identify which SST files belong to a particular slice, while relying on compute nodes to define block boundaries in their RPC requests.
Designing Systems with Reliability, Efficiency, and Storage Effectiveness
A primary objective for all essential distributed applications, aligned with the advent of a novel storage layer, is to remain consistently accessible and exhibit optimal performance at all times. Real-time analytics built on Rockset require rock-solid reliability and ultra-low latency, translating directly to stringent requirements for the underlying storage layer. Given our ability to adapt and learn from S3, we prioritize reliability in our new storage layer, ensuring we can deliver read requests with disk-like latency.
Sustaining efficiency with compute-storage separation
To ensure Rockset’s compute-storage separation remains high-performing, the architecture is optimized to minimize the impact of community calls and reduce latency in retrieving data from disk. That’s due to block requests undergoing processing by the community, which may slow down performance compared to native disk read operations. To mitigate performance bottlenecks, compute nodes for numerous real-time applications store datasets in connected storage configurations to minimize inefficient data retrieval. Rockset leverages caching, read-ahead, and parallelization techniques to mitigate the impact of community calls.
By introducing an additional caching layer, Rockset significantly increases the amount of available cache space on compute nodes through the implementation of an SSD-backed persistent secondary cache (PSC), thereby effectively supporting large-scale processing of extensive data sets. The compute nodes, comprising both an in-memory block cache and a parallel scalable coordinate (PSC). The Persistent Storage Component (PSC) allocates a designated amount of cupboards space on compute nodes for storing RocksDB blocks that have recently been ejected from the in-memory block cache. Unlike the in-memory block cache, the Persistent Storage Cache (PSC) preserves data across restart cycles, ensuring consistent performance by eliminating the need for the application to retrieve cached information from a newly initialized storage layer.
Question execution has been deliberately engineered to minimize the performance impact of requests traversing the network by leveraging prefetching and parallelism capabilities. In-parallel, blocks are swiftly retrieved to support computational demands, while compute nodes concurrently process existing data, effectively masking latency resulting from the round-trip time required for inter-node communication within a distributed system? Multiple blocks are retrieved in a single request, thereby reducing the number of RPCs and increasing data transfer efficiency. Can Compute nodes simultaneously retrieve blocks from both the native PSC, potentially overwhelming SSD bandwidth, and the novel storage layer, concurrently stressing community bandwidth.
Retrieving blocks obtainable within the sizzling storage layer is 100x quicker than learn misses to S3, a distinction of <1ms to 100ms. In a real-time system such as Rockset, it is imperative to exclude S3 downloads from the workflow path.
When a compute node seeks a block associated with an undetected file in the caching tier, it is essential that the storage node procures the required SST file from Amazon S3 before retransmitting the block to the compute node. To meet the low-latency requirements of our customers, it is crucial that we ensure all necessary data blocks are readily available in the hot storage layer prior to compute nodes requesting them. The novel storage architecture realizes its benefits through a trifecta of innovative mechanisms:
- When a newly generated SST file is produced, compute nodes dispatch a synchronous prefetch request directly to the innovative storage layer in real-time. During memtable flushes and compactions, this phenomenon arises. RocksDB ensures the timely availability of a committed memtable flush or compaction by coordinating with the new storage layer to download the file, thereby guaranteeing that the file is accessible before a compute node can request blocks from it.
- Upon detection of a novel slice, triggered by a compute node’s submission of either a prefetch or learn block request for a file affiliated with that slice, the storage node takes proactive measures to scan S3 and retrieve the remaining data records for said slice. Records data for each slice shares the same prefix in S3, simplifying the process.
- Storage nodes intermittently perform scans of S3 to ensure the slices they manage remain synchronised. Records from regions with gaps in the data are downloaded to fill those gaps, while outdated locally accessible files that are no longer relevant are removed.
Replicas for Reliability
To ensure high availability and reliability, Rockset stores redundant copies of data, typically up to two instances, across distinct storage nodes within its scalable storage layer. Rendezvous hashing is employed to determine the primary and secondary storage nodes for an information slice’s initial placement. As the initial owner, the individual promptly retrieves the datasets for each portion through prefetch remote procedure calls (RPCs) initiated by compute nodes and by examining Amazon’s Simple Storage Service (S3). The secondary proprietor exclusively retrieves the file once it has been processed by a compute node. In order to ensure seamless continuity during a scaling-up process, the original owner typically preserves a duplicate until the new proprietors successfully download the relevant data. During this interval, compute nodes leverage the prior owner as a fallback location for block requests, ensuring uninterrupted operations.
By designing the new storage layer, we discovered that we could potentially reduce storage costs without compromising resiliency by merely duplicating only part of the data. To guarantee that desired information remains accessible despite the potential loss of any copy, we employ a Last Recently Used (LRU) information construction approach. We designate a specific amount of disk space within the scorching storage layer as a last-recently-used (LRU) cache for secondary data copies. Following thorough analysis of manufacturing testing data, it was revealed that maintaining secondary copies for approximately 30-40% of stored information in conjunction with the in-memory block cache and PSC on compute nodes is sufficient to prevent unnecessary S3 retrievals, even in the event of a storage node failure.
By leveraging dynamically resizable last recently used (LRU) caches for secondary data replicas, Rockset minimizes the required disk storage capacity. In various data processing systems, a buffer’s primary function is to facilitate the efficient ingestion and downloading of fresh data into the underlying storage mechanism. We successfully developed a cutting-edge storage layer that prioritizes environmental sustainability by leveraging native disk capabilities and optimizing buffer performance through dynamic Last-Recently-Used (LRU) resizing. As the Local Replication Units’ dynamics evolve, they adapt to shifting demands by reducing the allocation of space for secondary data copies in response to increased requirements for ingesting and downloading information. Rockset leverages the available disk capacity on its storage nodes by effectively employing spare buffering capabilities to store data efficiently.
To accommodate instances where demand grows faster than inventory storage capabilities, we decided to stockpile primary products in Local Rapid Unloading areas (LRUs). If the cumulative ingestion rate exceeds the scalability of the underlying storage layer, it is theoretically possible that Rockset could eventually run out of disk space, necessitating the use of Least Recently Used (LRU) algorithms to prevent ingestion from halting. By caching primary data within the least-recently-used (LRU) cache, Rockset can efficiently evict stale main copy information that hasn’t been recently accessed, thereby freeing up space to ingest and serve fresh queries.
By compressing data storage and utilizing available disk space more efficiently, we were able to significantly reduce the operational costs associated with our new storage layer.
The least recently used (LRU) ordering for all records data is persisted to disk, ensuring its survival across deployments and process restarts. To ensure secure deployment and scaling of the cluster without duplicating the entire dataset.
In a standard rolling code deployment, the process involves gradually phasing out an existing model by shutting it down before introducing and running a newly designed one. With this downtime between courses, we’re faced with an inconvenient interval where the previous course has drained and the new one hasn’t yet been prepared, leaving us to choose between two imperfect options.
- Records stored in the storage node will be inaccessible during this period? Can efficient data retrieval persist in such instances, where various storage nodes may need to access SST record sets upon request from compute nodes before the storage node is reactivated?
- While executing data replication, redistribute responsibility for stored data from one storage node to another. While this approach maintains efficiency in the new storage layer during deployments, it unfortunately gives rise to significant information movement, thereby prolonging deployment times significantly. The revised text would read: “This feature would also lead to an increase in our S3 pricing due to the sheer volume of GetObject requests.”
The deployment strategies designed for stateless applications are rendered ineffective when applied to stateful programmes such as the newly introduced storage layer. We implemented a zero-downtime deployment strategy that eliminates information drift while maintaining the availability of all related data. Here’s how it operates:
- As an incremental upgrade process commences on each storage node, the existing methodology for the preceding code model remains operational concurrently. With its seamless integration onto identical hardware, this course also gains access to all existing SST files stored on that node.
- New processes seamlessly assume control from their predecessors, handling block requests efficiently on compute nodes.
- Once newly implemented procedures are fully operational, the responsibility for old processes may dwindle.
Courses of operations on the same storage node always fall into the same position in the Rendezvous Hashing ordered list. This allows for a doubling of processing capacity without any data transfer. The “Energetic Model” world configuration parameter allows the system to determine which energy-efficient proprietor is optimal for each storage node. Nodes leverage this information to decide which route to dispatch requests.
Since deployment has occurred without any unavailability, this course has yielded significant operational benefits. Companies can initiate fresh launches with novel twists, while the point at which these newer versions start addressing customer demands is a separable milestone. We plan to introduce new processes, gradually increase traffic to these updates, and swiftly revert back to prior versions without introducing fresh processes, nodes, or data transfer if issues arise. A speedy rollback reduces the probability of any points being awarded.
Scalable Storage Layer Redundancy Optimizations for Enhanced Performance
The new storage layer guarantees ample capacity to store replicas of every file. As the system approaches its capacity limits, additional nodes are automatically added to the cluster to ensure optimal performance. Current nodes rapidly transfer data segments that now reside on the newly established storage node, freeing up space for additional data records.
Despite potential changes by the owner of a knowledge slice, the search protocol guarantees that compute nodes can still locate information blocks. When adding N storage nodes simultaneously, the initial owner of a slice will be no more than the (N+1)st node in the Rendezvous hashing process. If subsequent computation nodes encounter issues, they can always locate a block by concurrently querying the second, third, up to the (N+1)th server on the list, provided the block exists within the scorching storage layer.
When the newly implemented storage layer recognizes that it is operating at an overprovisioned capacity, it will subsequently reduce the number of nodes in use to minimize costs. When removing a single node from the cluster, it will inevitably result in data loss and inconsistencies, as the remaining nodes are left with outdated information previously held by the decommissioned node, which is stored in S3. To avoid being drained, the node to be eliminated transitions into a “pre-draining” state initially.
- As data reaches the end of its lifespan, the designated storage node initiates a process by transmitting segments of information to the subsequent storage node in sequence. The storage node is configured using Rendezvous hashing’s in-line setting.
- Once all slices are replicated to the next available storage node, the designated delete node moves away from the Rendezvous hash table. Does this ensure that the data remains accessible and queryable at all times, even in scenarios where storage nodes are reduced or optimized?
This innovative design enables Rockset to boast a remarkable 99.9999% cache hit rate for its blazing-fast storage layer, eliminating the need for redundant data replicas. This design enables Rockset to rapidly scale its infrastructure up or down as needed, thereby ensuring optimal performance and responsiveness.
To prevent compute nodes from accessing Amazon S3 during question periods, they must obtain required data blocks from nearby storage nodes that likely possess the necessary information on their local disks. Computing nodes obtain this through an optimistically implemented search protocol.
- The compute node initiates a disk-only block retrieval via a TryReadBlock Remote Procedure Call (RPC), targeting the primary proprietor. When attempting to retrieve data from a storage node’s native disk, the RPC will return an empty end result if the requested block is not readily available. The compute node concurrently sends a verification request to the secondary proprietor via the BlockExists method, which returns a boolean indicator specifying whether the block is readily available on the secondary proprietor.
- When the initial proprietor returns the specified block within the TryReadBlock response, the learning objective is satisfied. When neither the primary nor the secondary proprietor possesses the required data, yet the secondary proprietor is aware of its existence according to the BlockExists response, the compute node initiates a ReadBlock RPC call to the secondary proprietor, thereby satisfying the learning process.
- When neither proprietor is able to provide instant access to the required data, the compute node initiates a request to the failover location designated for the information slice by sending a BlockExists RPC. Here is the rewritten text: The proposed design features a next-generation storage node grounded in Rendezvous Hashing principles. If the failover indicates regional accessibility of the block, the compute node will read from that location.
- If certainly one of these three storage nodes had the file regionally, then the learn might be happy shortly (<1ms). In rare instances where an entire cache miss occurs, the ReadBlock RPC initiates a synchronous data retrieval process from Amazon’s S3 storage system, which typically takes between 50 and 100 milliseconds to complete. While this approach preserves question availability, it may inadvertently introduce a delay in question latency.
Objectives of this protocol:
- To minimize the need for simultaneous S3 downloads, utilize the existing cache in the blazing-fast storage tier when the required blocks are already present and up-to-date. To maximize the probability of detecting the available information block, the compute node may potentially engage with a multitude of failover storage nodes in excess of one.
- Reduce load on storage nodes. Disk I/O bandwidth is a highly valued and valuable resource on storage nodes. The storage node responsible for fulfilling the request is the one that should acquire knowledge from the native disk. The block exists; this lightweight operation does not necessitate any disk interaction.
- Reduce community site visitors. To conserve unnecessary network resources, a single storage node responds with the requested data. Sending two TryReadBlock requests to the primary and secondary house owners initially could potentially save a single round-trip journey under certain circumstances, thereby streamlining the communication process. If the primary property owner lacks access to this information, but the secondary property holder is aware of it. However, this would effectively quadruple the amount of data transmitted through the network per block transaction. In the vast majority of cases, the initial owner promptly supplies the required blocks, making it unnecessary to invest in redundant data transmission.
- To ensure seamless collaboration on Software as a Service (SaaS) project S3, the primary homeowner and secondary homeowners must share identical expectations regarding their respective roles and responsibilities within this digital framework. The TryReadBlock and BlockExists RPCs trigger an asynchronous retrieval from S3 when the underlying file is not locally accessible, initiating a fetch operation to ensure data integrity. The underlying file will remain accessible for any subsequent inquiries.
As the course of recalls prior search results, it optimizes subsequent queries by having compute nodes transmit a solitary TryReadBlock RPC to the previously accessed reliable storage node, thereby leveraging existing knowledge. The revised text is: This optimizes the process by eliminating unnecessary BlockExists RPC queries to the secondary proprietor.
By leveraging the Scorching Storage Layer, organizations can experience a significant reduction in data access times, with latency decreasing by up to 3x compared to traditional storage solutions. This is achieved through the layer’s proprietary caching mechanism, which intelligently identifies and prioritizes frequently accessed data sets.
As a result, businesses can improve their overall operational efficiency, allowing them to better respond to changing market conditions and customer demands. With faster access times, organizations can make more informed decisions in real-time, ultimately driving revenue growth and improved competitiveness.
Additionally, the Scorching Storage Layer provides enhanced security features, including advanced encryption and secure data replication capabilities, ensuring that sensitive information remains protected from unauthorized access or malicious attacks.
Furthermore, the layer’s scalable architecture enables seamless integration with existing infrastructure, allowing organizations to easily adapt their storage needs as business demands evolve. With its proven track record of reliability and performance, the Scorching Storage Layer is an attractive option for businesses seeking a robust and efficient data management solution.
RockSet disentangles compute and storage, thereby achieving performance parity with tightly integrated applications through its blazingly fast storage tier. The novel storage layer is a caching mechanism built upon Amazon S3, designed for optimal performance by significantly reducing the overhead of block requests through efficient community-driven interactions with S3. To ensure optimal performance while maintaining a competitive edge in terms of pricing, this innovative storage solution is engineered to efficiently manage data duplication, maximize available capacity, and seamlessly scale up or down as needed. With the introduction of zero-downtime deploys, we have ensured that the deployment of new binaries does not result in a loss of efficiency.
Because Rockset decouples compute and storage, enabling the execution of numerous functions on live, shared data in real-time. Digital cases can be quickly scaled up or down in response to changing demands and query volumes, without requiring data migration, due to the absence of any information dependencies. Storage and compute resources will be scaled and sized separately to optimize costs by reducing unnecessary expenses, ultimately rendering this approach more cost-effective compared to traditional, monolithic solutions like Elasticsearch.
The introduction of compute-storage separation marked a significant milestone in our journey to optimize the processing of real-time, streaming workloads by isolating distinct compute types – streaming ingest compute and query compute – thereby enabling seamless data flow and efficient resource utilization. As of this blog’s publication, Rockset stands out as a unique real-time database, distinct for its ability to isolate compute-storage and compute-compute operations.
Explore in-depth learning opportunities about our utilization of RocksDB at Rockset by examining these informative blogs:
Authors:
Yashwanth Nannapaneni, a software program engineer at Rockset, and Esteban Talavera, another software program engineer at Rockset.