Engineers gathering for the convention on search, analytics, and AI functions at scale converged on the PC History Museum’s learning lab and simultaneously joined the Index livestream on Thursday, November 2.
The convention was a stunning tribute to the remarkable engineering prowess that underlies the plethora of applications that seamlessly integrate into our daily existence. Numerous talks featured practical applications, specifically in search, suggestion engines, and chatbots, highlighting the iterative development process, from initial deployment to tuning and scaling for optimal performance. As we celebrated the tenth anniversary of RocksDB, we were fortunate enough to host a distinguished panel of engineers who played a pivotal role in shaping the database’s early development. At Index, professionals would seize the opportunity to learn from their peers’ successes and setbacks through engaging session content and informal discussions.
Design Patterns for Subsequent-Gen Apps
Venkat Venkataramani of kicked off the day by discussing lessons learned from building at scale, emphasizing the importance of selecting the appropriate technology stack, optimizing developer velocity, and ensuring effective scalability. Joined on stage by CEO Jay Kreps, he engaged in a lively debate about the intersection of knowledge streaming and Generative AI. Timely and accurate delivery of crucial data is a pivotal aspect for these applications. To seamlessly integrate the most recent exercise data, including novel insights on the enterprise and its clients, into AI applications that require real-time updates, it’s crucial to employ a robust RAG framework that enables efficient indexing and retrieval of this critical information.
Venkat and Jay share their hands-on expertise in building and scaling search and AI capabilities at esteemed companies like Uber, Pinterest, and Roblox, delving into technical nuances while recounting valuable takeaways from their experiences. As the convention progressed, distinct themes began to emerge from the discussions.
Actual-Time Evolution
Over the past few years, numerous presenters have noted significant developments within their companies, with a notable shift towards embracing real-time search, advanced analytics, and artificial intelligence capabilities. Nikhil Garg succinctly articulated that actual time encompasses two fundamental concerns: low-latency online service delivery and serving up-to-date, non-precomputed results. Each matter.
In various discussions, Sai Ravuru and Ashley Van Identify emphasized the significance of streaming information for their respective purposes, while Girish Baliga detailed how they construct a comprehensive pipeline for real-time updates, incorporating live ingestion through Flink and leveraging live indexes to augment their existing indexes. Yexi Jiang underscores the importance of fresh content on homepage recommendations, as diverse and novel information synergistically impacts user suggestions, echoing the influence of recent social connections or game activity on personalized recommendations. At IBC 2022, Emmanuel Fuentes discussed the numerous real-time challenges they encounter, including ephemeral content, channel surfing, and the requirement for minimal end-to-end latency to enhance their customer experience.
Shu Zhang reflects on the evolution of Dwell’s algorithmic approach, transforming from a traditional time- and relevance-ranked feed to a cutting-edge, real-time rating system that responds to user inquiry at the exact moment they pose a question.
Pinterest requires precise latency capabilities in its ad-serving platform, enabling it to rank 500 ads within a mere 100 milliseconds. The benefits of real-time AI extend beyond user experience, with Nikhil and Jaya Kawale observing that it can lead to more efficient utilization of computational resources by generating suggestions only when needed, thus avoiding precomputation.
The pervasive need for timely data processing has led many audio systems to enthusiastically leverage RocksDB as their go-to storage engine or inspirational source for achieving real-time performance.
Separation of Indexing and Serving
When operating at scale and encountering efficiency issues, companies have resorted to separating indexing from serving as a means of mitigating the computational impact that computationally intensive indexing can have on query performance. According to Sarthak Nandi, the issue stemmed from their Elasticsearch setup at, where each node functioned as both an indexer and a searcher, resulting in indexing pressure that slowed down searches. Raising the number of replicas does not address the problem, as each replica shard must still perform indexing tasks equally efficiently, ultimately leading to an increased overall indexing burden.
To overcome the efficiency challenges plaguing their existing platform, Yelp rearchitected its search infrastructure, dividing labor between primary and replica nodes: index queries are routed to primaries, while search requests are distributed among replicas. The primary node executes indexing and merges sections, while replicas simply replicate the combined segments from the initial node. With a robust separation of indexing and serving, replicas are able to efficiently process search queries without being hindered by the demands of indexing.
When Uber faced a situation where indexing loads on their serving system could impact query efficiency. In Uber’s instance, periodic writes of reside indexes to snapshots enable the propagation of these updates to their core search indexes. The snapshot computations triggered CPU and memory spikes, necessitating additional resources to be allocated. To mitigate these limitations, Uber employed a strategic approach by segmenting their search infrastructure into two distinct clusters: one dedicated to processing requests (serving cluster) and another focused on generating snapshots, thereby isolating the serving system from indexing maintenance tasks. This design allows query requests to remain unaffected by index updates, ensuring a seamless user experience.
Architecting for Scale
Presenters shared insights into realizations that emerged as their roles expanded, highlighting the need for corresponding adjustments. Tubi’s relatively compact content library at the time allowed Jaya to remark that rating the entire catalogue for all users was feasible through offline batch processing. As Tubi’s vast library expanded, computational demands became overwhelming; as a result, they were forced to limit the number of candidates ranked or transitioned to real-time processing for more efficient decision-making. At T.R., our AI-powered office search app seamlessly integrates with existing building management systems to provide users with a comprehensive and intuitive way to find available offices and workspaces. As noted by Vishwanath and James Simonsen, scaling up a search index led to the accumulation of increasingly lengthy crawl backlogs, highlighting the potential performance implications of high-scale operations. The team’s task in assembly required them to develop a scalable solution that could adapt to diverse points of growth across distinct price tiers. By leveraging asynchronous processing, they enabled disparate parts of their crawl to scale individually, allowing for optimized resource allocation and prioritization of crawling tasks even in situations where their crawlers reached saturation points.
At scale, value is often a pressing concern. In discussing storage trade-offs within suggestion programs, Nikhil from Fennell noted that storing everything in memory is prohibitively expensive. To ensure seamless engineering operations, consideration should be given to disk-based solutions like RocksDB; in the event that solid-state drives become prohibitively costly, tiered storage solutions leveraging Amazon’s S3 cloud infrastructure will be necessary. With the deployment of search clusters in stateless mode on Kubernetes, Yelp’s team was able to eliminate ongoing maintenance costs and scale their infrastructure seamlessly to match fluctuating user traffic patterns, resulting in a 50% reduction in expenses while enhancing overall efficiency?
While attendees have shared several scaling experiences during discussions, not every obstacle is immediately apparent; it’s crucial for companies to anticipate at-scale challenges from the outset and prepare themselves for long-term growth.
Wish to Study Extra?
At the inaugural Index Convention, attendees had the opportunity to engage in thought-provoking discussions with top engineering leaders at the forefront of building, scaling, and productionizing innovative search and AI capabilities. Their shows have presented a wealth of study alternatives for individuals, with even more valuable insights shared in their comprehensive discussions.
Watch the entire convention video. To stay informed about upcoming index conferences.