What’s the state of labor market intelligence at SkyHive when utilizing Rockset and Databricks? SkyHive is revolutionizing talent analytics with an innovative platform that leverages Rockset and Databbrands to create a seamless integration.

October 11, 2024

77

Empowering professionals to thrive in a rapidly evolving work landscape, our cutting-edge reskilling platform delivers seamless end-to-end solutions. By leveraging AI-driven abilities assessment, it accurately predicts future skill demands and bridges knowledge gaps with personalized learning pathways and relevant job opportunities. Collaborating closely with household leaders alongside Accenture and Workday, we’ve garnered recognition as a ‘Cool Vendor’ in Human Capital Management from esteemed research firm Gartner.

We have successfully developed a comprehensive Labor Market Intelligence database featuring a vast repository of information that covers:

Data profiles on approximately 800 million anonymized employees and 40 million firms.
Across 150+ global locations, approximately 1.6 billion job postings
Three trillion unique skill combinations are needed to meet the demands of both current and future job requirements.

Every day, our database processes an astonishing 16 terabytes of data, sourced from a variety of inputs: job postings harvested by our network crawlers and paid streaming knowledge feeds. Now that we’ve completed extensive research in advanced analytics and machine learning, we’re able to derive valuable insights into current and future global job trends.

Thanks to our pioneering expertise, positive reputation, and strategic partnerships with industry leaders like Accenture, we’re experiencing rapid growth, with a daily influx of 2-4 new business prospects.

Pushed by Knowledge and Analytics

Like trailblazing innovators such as Uber, Airbnb, and Netflix, our company is revolutionizing the global Human Resources and Human Capital Management (HR/HCM) landscape by championing a new era of data-driven enterprises.

A cloud-based platform empowers workers to develop their professional aptitudes by providing interactive resources and expert guidance on how to acquire essential skills.
A cloud-based analytics platform offers an intuitive, user-friendly dashboard for executive and HR stakeholders to explore and drill down into insights on a), b) and c).

SkyHive Enterprise dashboard

A premium offering enabling businesses to tap into more nuanced analytics, including benchmarking against competitors and talent acquisition recommendations to bridge skill disparities.

SkyHive platform

Despite its flexibility, scalability, and ease of use, MongoDB’s performance in handling analytical queries remains a challenge. When dealing with large datasets or complex aggregations, the database may struggle to provide timely results, potentially leading to slower response times or even timeouts.

To mitigate these issues, consider implementing the following strategies:

? Avoid using MongoDB for complex analytics: As much as possible, keep your analytics workloads separate from your operational databases and use a purpose-built solution like Hadoop or Spark.
? Use MapReduce: MongoDB’s built-in MapReduce functionality allows you to process large amounts of data in parallel, but it may not be the best choice for complex queries.
? Optimize query performance: Ensure that your database is well-indexed, and optimize your queries by minimizing joins, using efficient sorting and grouping algorithms, and leveraging window functions where possible.
? Leverage aggregation frameworks: Utilize MongoDB’s Aggregation Framework to simplify complex queries and improve performance.

Approximately 16 terabytes of raw text data from various internet crawlers and knowledge feeds are transferred daily into our databases. The knowledge was subsequently processed and then loaded into our analytics and serving database, MongoDB.

To efficiently handle advanced analytics across vast datasets involving job postings, resumes, programming skills, and diverse geographic locations, the MongoDB query performance needed significant improvement, especially when querying patterns were not predetermined in advance. This prompted multidimensional queries and joins to become sluggish and expensive, rendering it impossible to provide the interactive performance our customers demanded.

Can we possibly identify every knowledge scientist worldwide with a medical trials background and at least three years of pharmaceutical experience? While the operation’s expense might have been staggering, the shopper was instead driven by a desire for swift results.

Upon inquiry about expanding the search function to non-English speaking countries, I felt compelled to clarify that our current product limitations prevented us from doing so, primarily due to difficulties in standardizing data across various languages within our MongoDB framework.

Additionally, there are limitations on payload sizes in MongoDB, accompanied by various hardcoded quirks that pose unexpected challenges. The notion of questioning Great Britain as a rustic was unthinkable.

Despite encountering significant hurdles with question latency and integrating our data into MongoDB, we were eager to pivot to a different approach.

What do you mean by actual-time knowledge stack? Do you envision a seamless fusion of cutting-edge technologies to empower real-time insights?

By combining the power of Apache Databricks with the innovative capabilities of Rockset, we can create an unparalleled actual-time knowledge stack that revolutionizes the way organizations extract value from their data.

With Databricks’ robust big-data processing engine and scalable architecture, we can efficiently ingest and process vast amounts of data in real-time. Then, by harnessing Rockset’s cloud-native, real-time analytics capabilities, we can rapidly transform this information into actionable insights that drive business decisions.

This actual-time knowledge stack enables organizations to respond swiftly to changing market conditions, customer behaviors, or operational challenges, thereby fostering a competitive edge and driving growth.

We required a storage layer capable of supporting large-scale machine learning processing for terabytes of data generated daily.

While contrasting Snowflake and Databricks, we ultimately chose Databricks due to its seamless compatibility with various tooling options and support for open-source file formats. By leveraging Databricks, we have successfully established a robust lakehouse architecture, comprising three interconnected tiers that efficiently store and process our vast repository of knowledge. Raw data converges on our Bronze layer, where it undergoes preliminary processing before being channeled through Spark ETL and machine learning pipelines, which further refine and enhance the information to elevate it to our Silver layer. We form aggregated patterns across various dimensions, such as geographic location, job function, and time, storing them in the Gold layer for future use.

We’ve achieved sub-second SLAs for question latency, with customers now able to execute complex, multifaceted queries in mere low hundreds of milliseconds. Spark was not designed to handle such queries, which are processed as knowledge-intensive tasks that can take tens of seconds to complete. We sought a cutting-edge, real-time analytics platform that would aggregate and curate our collective knowledge into a comprehensive, single index – empowering us to deliver highly dimensional analytics on a moment’s notice.

We have chosen to deploy as our primary, user-facing serving database. Rockset continuously synchronizes with the Gold layer’s knowledge, rapidly constructing a comprehensive index of the acquired information. Within the Gold layer, Rockset aggregates course-grained data, then joins and refines it across multiple dimensions to support fine-grained aggregations that satisfy customer query demands.

This feature empowers you to provide 1) curated, pre-defined content feeds tailored to prospects’ needs and interests; and 2) flexible, ad-hoc search capabilities for users seeking specific information, such as “What are the most remote job opportunities in America?”

Sub-Second Analytics and Quicker Iterations

Following extensive refinement and evaluation, we successfully migrated our Labour Market Intelligence database from MongoDB to Rockset and Databricks after several months of development and testing. By leveraging Databricks, we’ve significantly enhanced our ability to efficiently handle massive datasets while seamlessly executing machine learning models and other non-time-critical processes. Meanwhile, Rockset enables sophisticated querying of massive datasets, delivering results to customers in mere milliseconds at a remarkably low computational cost.

With our innovative platform, clients can instantly access the top 20 skills from anywhere in the world, receiving results in near real-time. We’re able to significantly augment our support for buyer queries, effortlessly handling tens of thousands daily, regardless of query complexity, concurrent volume, or unexpected spikes across the system – much like when bursty data streams suddenly flood in.

By consistently meeting our customer service level agreements (SLAs), we’re also boasting a swift response time of under 300 milliseconds. By delivering instant answers that precisely meet the needs of our potential customers, we outshine our competitors in every aspect. Using Rockset’s SQL-to-REST API makes it effortless to present question outcomes in a manner that aligns with specific use cases.

By integrating Rockset, we significantly accelerate improvement cycles, thereby streamlining internal processes and driving increased external sales. Prior to this development, our team typically required an investment of three to nine months to successfully build and refine a concept into a viable product or service offering for our customers. With Rockset’s capabilities equivalent to its SQL-to-REST using Question Lambdas, we will soon deploy tailored dashboards for prospective buyers just hours after a comprehensive sales demonstration.

We dub this milestone “Product Day Zero.” At this point, our prospect base is primed, and we can confidently invite them to explore our offerings without the need for further promotion. With seamless integration, they’ll collaborate effortlessly, leveraging our expertise without any discernible lag. Rockset’s low operational overhead and serverless architecture simplify deployment of new applications to clients and prospective buyers for our developers.

We plan to further optimize our knowledge framework while expanding the application of Rockset across various domains.

Geospatial queries enable customers to locate specific points of interest on a map by seamlessly zooming out and in, thereby facilitating intuitive and efficient searches.
Serving knowledge to our machine learning fashion enthusiasts.

These tasks would appear to unfold over the course of the next year. With Databricks and Rockset, we’ve successfully built a formidable tech stack. Despite this, there remains potential for further growth and expansion.

What’s the state of labor market intelligence at SkyHive when utilizing Rockset and Databricks? SkyHive is revolutionizing talent analytics with an innovative platform that leverages Rockset and Databbrands to create a seamless integration.

Pushed by Knowledge and Analytics

Sub-Second Analytics and Quicker Iterations

Related Articles

From Redmond to San Diego: VS Reside! Highlights, Session Examples, and What’s Subsequent

Yamaha uncrewed Helicopters for Firefighting Oregon

At this time’s Humanoid Robots Look Exceptional—however There’s a Design Flaw Holding Them Again

LEAVE A REPLY Cancel reply

Latest Articles

From Redmond to San Diego: VS Reside! Highlights, Session Examples, and What’s Subsequent

Yamaha uncrewed Helicopters for Firefighting Oregon

At this time’s Humanoid Robots Look Exceptional—however There’s a Design Flaw Holding Them Again

Sen. Hawley to probe Meta after report finds its AI chatbots flirt with youngsters

Apple Arcade Provides NFL Retro Bowl ’26, Jeopardy! Every day, And Extra This September