Testing DeepSomatic’s skill to identify cancer-related variants
We skilled DeepSomatic on three of the breast most cancers genomes and the 2 lung most cancers genomes within the CASTLE reference dataset. We then examined DeepSomatic’s efficiency in a number of methods, together with on the one breast most cancers genome that was not included in its coaching knowledge, and on chromosome 1 from every pattern, which we additionally excluded from the coaching.
Outcomes present that DeepSomatic fashions developed for every of the three main sequencing platforms carried out higher than different strategies, figuring out extra tumor variants with greater accuracy. The instruments used for comparability on short-read sequencing knowledge had been SomaticSniper, MuTect2 and Strelka2 (with SomaticSniper particularly for single nucleotide variants, or SNVs). For long-read sequencing knowledge we in contrast towards ClairS, a deep studying mannequin skilled on artificial knowledge.
In our exams DeepSomatic recognized 329,011 somatic variants throughout the six reference cell traces and a seventh preserved pattern. DeepSomatic does significantly nicely at figuring out most cancers variations that contain insertions and deletions (“Indels”) of genetic code. For a lot of these variants, DeepSomatic considerably elevated the F1-score, a balanced measure of how nicely the mannequin finds true variants in a pattern (recall) whereas not making false positives (precision). On Illumina sequencing knowledge the next-best methodology scored 80% at figuring out Indels, whereas DeepSomatic scored 90%. On Pacific Biosciences sequencing knowledge, the next-best methodology scored lower than 50% at figuring out Indels, and DeepSomatic scored greater than 80%.