Recently, I had the privilege of moderating a thought-provoking discussion on personalization and advisory programs with two seasoned technical consultants from renowned organizations like FAANG and other leading tech companies.
Serving as the Head of Engineering and Analysis at a pioneering Collection C startup, I spearhead the development of an innovative AI platform designed to revolutionize the world of robotics in logistics. As the former Chief Technology Officer (CTO) of Dwelling Providers’ website, Prabhu oversaw a team of 200 professionals and spearheaded the redevelopment of customer experience using machine learning-powered search technology. Prior to joining, Prabhu served as the head of core infrastructure. Prabhu has also worked in search and knowledge engineering positions at Twitter, Google, and Microsoft.
Serves as Chief Executive Officer (CEO) and co-founder of a pioneering startup dedicated to developing revolutionary real-time machine learning infrastructure. Prior to joining Fennel AI, Garg served as a Senior Engineering Supervisor at Facebook, where he oversaw a team of over 100 machine learning engineers responsible for developing ratings and recommendations across multiple product lines. Garg also spearheaded a team of over 50 engineers in building the open-source machine learning (ML) framework. Prior to Facebook, Garg served as Head of Platform and Infrastructure at Quora, overseeing a team of 40 engineers and managers, and responsible for all technical initiatives and performance metrics. As a recognized thought leader in his field, Garg regularly contributes to online forums sharing timely insights and expert guidance.
Our seasoned experts have been sharing their wealth of knowledge with select clients, revealing the secrets to their success through real-time insights on classes, personalized advice, and innovative applications of machine learning gleaned from years of hands-on experience at the forefront of cutting-edge industries.
We gained valuable insights from Prabhu, Garg, and a select group of attendees who participated in our talk.
By far, our most skilled roundtable was the third such occasion we had held thus far this summer season. At Rockset, our CEO Venkat Venkataramani recently facilitated a thought-provoking panel discussion featuring expert information engineers. The topic under examination was the age-old debate surrounding SQL and NoSQL databases within the ever-evolving knowledge ecosystem. You’ll have the opportunity to review an abstract summarizing the key points and access the recording for further insight.
My colleague, Chief Product Officer and SVP of Advertising, Shruti Bhat, facilitated a thought-provoking dialogue examining the merits, hurdles, and repercussions of batch learning versus streaming learning for organizations today. View the .
Thumbtack is an online marketplace where users can connect with professionals offering various services, including gardening and furniture assembly, typically for tasks that require specialized skills, such as building and installing IKEA products. The core expertise bears little resemblance to Uber’s model and more closely resembles a relationship website in its approach. Shoppers are required to verify their intention to hire through a double opt-in process, involving the rental of professional services that may not be essential for all professionals.
Our initial section involves patrons describing their occupations through a semi-structured format, allowing us to potentially share behind-the-scenes insights that can be matched with executives in their local area. Two concerns have arisen regarding this mannequin. The process demanded considerable mental effort and physical stamina from the professional as they deliberated at length over which tasks to prioritize. One potential roadblock to our growth was that particular challenge. The introduction of this feature caused a delay for customers, who were accustomed to receiving instant responses with each online transaction, making them wait longer than expected. We ultimately developed an innovative concept called Immediate Consequences, which enabled the double-optin process and matching technology to function in real-time, streamlining the entire experience. Instantaneous Outcomes offers two types of predictive models. The primary listing showcases a selection of residential experts that potential clients may find appealing. The following are some jobs that a professional in this field might find particularly captivating: As a result, gathering detailed information proved challenging due to the need to compile data across numerous hundreds of diverse courses. Despite being a straightforward and conventional approach, we managed to achieve success in the end. As a foundation, we initially employed heuristics; once our understanding grew sufficient, we leveraged machine learning to derive more accurate forecasts. Thanks to our executives’ frequent presence on our platform multiple times a day, this was achievable. Thumbtack transformed a diverse array of methodologies into a cohesive framework for developing real-time matching expertise.
During my tenure as head of the 100-person Machine Learning (ML) product team at Facebook, I was afforded the chance to tackle approximately 12 distinct ranking recommendation projects. As you work on enough projects, each imperfection starts to feel connected. While there are some nuances present, these differences are actually more connected than distinct. Suitable abstractions started emerging spontaneously. As part of my tenure at Quora, I was responsible for leading a team of machine learning (ML) infrastructure specialists that started small, comprising around 5-7 experts, before expanding significantly over time. We regularly invite our buyer groups to participate in our internal staff conferences, where we provide a platform for them to share their experiences and insights on the challenges they’ve been facing. The organization was overly responsive to situations rather than taking a lead in shaping its own destiny. Following the identification of the challenges they faced, we worked retrospectively and then leveraged our system engineering expertise to identify the key objectives that needed to be achieved. While the precise rating personalization engine may be a complex system, it is indeed mission-critical. This ‘service’ contains a substantial amount of business logic within it, as such. Often high-performance C++ or Java. The sheer complexity of the problems at hand makes it overwhelmingly daunting for people to engage with and offer meaningful contributions. As our work largely consisted of setting aside conventional practices and reassessing our fundamental premises, it mirrored the rapid advancements in cutting-edge technology and innovative strategies to capitalize on these developments. Our goal was to empower buyers by making them more productive, eco-friendly, and enabling access to cutting-edge ideas.
Personalization mustn’t be equated with machine learning. Here is the rewritten text:
To illustrate this concept with an example like Thumbtack, one could develop a rule-based system that automatically flags and categorizes all job listings within a specific category if a particular professional or service provider receives an unusually high number of reviews. That’s not machine studying. Instead of using machine learning to personalize your model, could you consider applying the technology to ensure it doesn’t become overly personalized? While attending Facebook, our team leveraged machine learning to identify the most trending topic currently popular. That distinction between machine learning and personalization was lacking clarity.
As a business, unfortunately, we are still grappling with ways to distinguish between problems. In many corporations, a common practice exists where both the infrastructure and business logic within an organization are developed in the same programming language, resulting in identical binary code. Some individuals lack concrete structures that allow them to own this aspect of the core business, and instead, they possess the other half. It’s all blended up. As a general rule, many companies observe an increase in customer complaints and issues when their dedicated personalization teams reach a size of around 6-7 members. As a result, 1-2 individuals may naturally be drawn to infrastructure-related projects. Without considering the nuances of storage capacity and memory type, various individuals may overlook the significance of the number of nines in ensuring data availability. Major corporations like Facebook and Google are developing innovative ways to separate autonomous driving systems from their underlying business logic, allowing for objective drivers devoid of commercial influence. We’re revisiting age-old concepts from database theory, which long ago developed ways to dissect problems into manageable parts.
Significant computational resources are required, necessitating considerable storage capacity. Most pre-computations won’t be utilized due to limited customer logins within the given timeframe.
Each day, our platform processes combinations of ‘n’ customer interactions using ‘n choose 2’ computations. Only a tiny proportion of clients connect at any one time. At Facebook, we boast a remarkable retention rate that surpasses any other product in recorded history. Despite this, pre-computation remains a redundant expense.
Companies focused relentlessly on product goals – a necessity, indeed. For those who present a migration proposal with a vague promise to deliver something “now” and then again “months later,” you’ll never receive approval. Determination of effective methodologies is crucial to successfully executing the migration process. One effective approach is to develop a novel product from scratch, building it on a fresh foundation of infrastructure. What were the key considerations in migrating Pinterest’s data processing pipeline from an HBase batch feed to a real-time system? To create a high-performance, real-time data feed, we leveraged the capabilities of RocksDB. Rest assured that migrating your legacy infrastructure will not cause undue stress. Migrating legacy systems is notoriously challenging due to the multitude of interconnected issues that have accumulated over time. Starting with fresh insights? As the pace of technological advancements accelerates, within a remarkably short span of time, your newly built infrastructure is poised to surpass all else in influence and dominance. Legacy infrastructure will have limited impact. When conducting a migration, it’s crucial to ensure that the finished product or buyer receives incremental value. Although framed as a one-year migration, expect incremental value delivery every three months. I’ve uncovered a straightforward approach to avoiding massive migrations. During a high-stakes endeavor at Twitter, our team embarked on a substantial and complex infrastructure migration project. Unfortunately, it didn’t turn out well at all. Progress unfolded at a remarkable pace. Despite initial plans to abandon the legacy system, we found ourselves compelled to continually update and refine it, ultimately performing a seamless migration to its successor.
As the pace of innovation accelerates, it’s evident that certain modules require offline processing akin to cutting-edge machine learning models, while the majority of serving logic has evolved into a real-time operation. Recently, I penned a blog post exploring the seven distinct reasons why real-time machine learning (ML) applications are supplanting traditional batch processes. One motive is price. As we transitioned each component of our machine learning system to operate in real-time, we consistently observed a notable increase in overall accuracy and correctness. Due to the nature of most products, they often exhibit a long-tailed distribution in terms of customer preference. Many individuals utilise the product extensively. Occasionally, certain events may occur sporadically throughout a prolonged period of time. Unfortunately, for most individuals, you’ve acquired hardly any knowledge factors. Those with the ability to leverage knowledge factors from just 60 seconds ago will find that they can personalize their approach significantly more effectively when armed with an exponentially larger pool of information.
Batch evaluation was a straightforward approach for processing vast amounts of data in parallel. And the infrastructure was accessible. While it’s true that this approach is extremely inefficient, it also falls short of being genuinely pure in its dedication to the product expertise required to build a comprehensive system. One of the most significant drawbacks is the rigid constraint it imposes on your builders, effectively limiting their ability to construct products at a pace that suits them while simultaneously stifling experimentation. Since data takes time to propagate, experimentation may require careful planning and consideration. The more time spent in real-time, the faster you’ll be able to refine your product, and the sooner your programs will be accurate. Whether a product offers real-time updates, such as those found on Twitter, or instead relies on more static content, like Pinterest, the importance of timely engagement remains unchanged.
In reality, individuals often perceive real-time programs as inherently easier to work with and debug, but for those responsible for designing them, this assumption is misleading – in fact, these systems are typically far more straightforward. What lies beneath a seemingly straightforward batch system: a tangled web of pipelines awaiting discovery and optimization? To troubleshoot this issue, you should first verify whether the problem persists after restarting your application. Next, check for any potential errors or warnings in your console or log files. Previous efforts to scale real-time programs in a timely manner had proven arduous, necessitating considerable engineering expertise to achieve success. Platforms have evolved to enable real-time interactions, revolutionizing the way we communicate and collaborate. To date, few still implement large-scale advisory initiatives in their operations?
I’m taken aback every time I witness a team prioritizing offline evaluations without a second thought, solely driven by expediency’s demands. We’ll simply implement this in Python. While acknowledging its limitations, we can mitigate the issues. In approximately six to nine months, a costly infrastructure will be built, with each passing day hindering innovative progress. It’s unfortunate that such a preventable mistake keeps occurring with such regularity. It has happened more than 10 times. If someone takes the time to plan thoughtfully, they may reconsider opting for an offline system in favor of something more dynamic and connected.
Contrasting the indexing process for a Google search with that of a client-facing platform like Airbnb, Amazon, or Thumbtack reveals fundamental disparities. A client’s inquiry commences with the articulation of their objective through specific phrases. Due to its foundation in primarily semi-structured knowledge, you can develop a powerful inverted index-based key phrase search feature that enables filtering capabilities. With Thumbtack, customers can search for a gardening expert and quickly narrow their options to find the ideal professional for their specific needs – such as a specialist in orchard management or apple tree care. Filtering capabilities empower shoppers and repair suppliers with unparalleled efficiency. You design and construct a sophisticated system incorporating both search capabilities and inverted index functionalities. Search indexes are often the most versatile tools for optimizing product velocity and leveraging developer expertise effectively.
In today’s era of personalized rating advice, traditional indexing remains a crucial foundation. When working on issues in real-time, it’s essential that we prioritize only a few hundred at a time while the individual is available. The latency fund ranges from approximately 4 to 500 milliseconds, no more. It’s unrealistic to expect a machine learning model to process and rate one million issues by itself. In cases where an inventory consists of 100,000 items, the lack of filtering options necessitates the implementation of a retrieval step, enabling the reduction of the initial pool to a more manageable 1,000 items through contextual scoring and ranking. This selection of candidates ultimately utilizes an index, typically an inverted index, as they do not begin with keywords like a conventional text search. What are some popular topics on YouTube that have received at least 50 likes?
* Gaming: Fortnite
* DIY tutorials: Upcycling old furniture
* Cooking recipes: Vegan desserts
* Travel vlogs: Exploring national parks in the United States
* Beauty and makeup: Skincare routines for acne-prone skin The confluence of disparate temporal eras and a scattering of indices somewhere. While you may not need to strive for perfection in your indexing approach, you shouldn’t settle for a mediocre solution either. Although I still suppose that indexing is an essential component of any advisory system. It’s not a question of indexing vs machine learning.
While injecting variety can enhance program ratings, it’s a relatively common software feature found in many rating systems. Here is the rewritten text:
To gauge customer engagement, we conducted an A/B test to measure the percentage of customers who noticed at least one story related to a vital global issue. By employing a diverse metrics approach, you’ll effectively prevent over-personalization. While excessive personalization can be problematic, I believe it’s often misused as an excuse to avoid integrating machine learning or superior personalization into products, even though I think constraints could be applied at the research level before the optimization stage.
Personalized experiences can manifest across various levels, from subtle tweaks to profound transformations. Take Thumbtack. Customers typically complete just a few household tasks annually. The personalization we would apply might focus primarily on a customer’s geographical location. For dwellings professionals who frequently utilize the platform, we can leverage their preferences to tailor the user experience more intimately. You should incorporate subtle unpredictability within any prototype to foster experimentation and interaction.
Personalization in machine learning is ultimately a sophisticated optimization technique. What specific goals and objectives should it aim to optimize towards? Product groups aim to define their vision and establish product goals. If I were unaware that the places originated from either ML or not, then how would you rate these two variations?
Variation 1:
Rating: Actual-time or batch? Which approach to resolving this issue do you think will yield the most effective results? In a machine learning-centric environment, that’s the role of product administration.