Attaining 10,000x coaching information discount with high-fidelity labels

August 7, 2025

33

Experiments

We wished to grasp which fashions and duties would profit most from our curation course of. As baselines for our experiments, we fine-tuned two LLMs of various sizes (Gemini Nano-1 with 1.8B parameters and Nano-2 with 3.25B parameters) on two duties of various complexity (decrease and better, primarily based on professional alignment) utilizing crowdsourced labels. Every crowdsourced information set has ~100K annotations and a powerful class imbalance, with round 95% benign labels on common.

We in contrast every of those 4 baseline circumstances in opposition to the corresponding curated situation wherein every mannequin (Nano-1 and Nano-2) is fine-tuned over a number of rounds utilizing the curation course of described above. At every iteration, we chosen our curated set of examples and used them for mannequin analysis and fine-tuning, as described above. All fashions plateaued earlier than reaching parity with the consultants’ inner alignment, so we stopped at 6 iterations (~400 fine-tuning and ~250 analysis samples) for the decrease complexity job and 5 iterations (~250 fine-tuning and ~150 analysis samples) for the upper complexity job. (Notice that the decrease complexity job had a bigger number of examples, which can account for the longer time wanted to converge.) Each information units had a last class stability of ~40% optimistic examples.

The desk under supplies an outline of the dimensions and high quality of the info utilized in every situation. Specialists reached a mean pairwise Cohen’s Kappa of .81 (on the decrease complexity job) and .78 (on the upper complexity job) by way of the curation course of. We think about these the ceiling for mannequin efficiency. To evaluate the standard of our crowdsourced information, we calculated Kappa alignment between crowdsourced annotations and consultants primarily based on our full curated set, which was .59 (decrease complexity) and .41 (greater complexity).

Attaining 10,000x coaching information discount with high-fidelity labels

Experiments

Related Articles

Key Variations and Use Circumstances

US investigators are utilizing AI to detect youngster abuse pictures made by AI

DJI Mavic 3 Basic vs. Phantom 4 Professional (Right here’s My Alternative)

LEAVE A REPLY Cancel reply

Latest Articles

Key Variations and Use Circumstances

US investigators are utilizing AI to detect youngster abuse pictures made by AI

DJI Mavic 3 Basic vs. Phantom 4 Professional (Right here’s My Alternative)

Main Theories of Consciousness Could Have Been Specializing in the Fallacious A part of the Mind

As individuals search for methods to make new buddies, listed here are the apps promising to assist