Thanks to all of our SMaHT colleagues and especially to @sedlazeck.bsky.social who led the hackathon which spawned the prototype of this pipeline!
Thanks to all of our SMaHT colleagues and especially to @sedlazeck.bsky.social who led the hackathon which spawned the prototype of this pipeline!
MosaicSim offers a realistic, scalable approach for assessing detection limits, with immediate applications to large sequencing efforts including those within the SMaHT Network, which was the springboard for this work.
A key (surprising) result was that ultra-high coverage (300Γβ450Γ) yields diminishing returns for mosaic variant detection. In many settings, 150Γ coverage performs comparably or better, highlighting opportunities for cost-effective study design.
Using MosaicSim, we benchmarked DRAGEN and found strong VAF- and depth-dependent performance limits. Sensitivity decreases sharply at low VAF, especially in complex genomic regions.
Detecting mosaic variants is challenging due to low VAFs and real sequencing noise. MosaicSim layers user-defined variants directly onto empirical WGS data, preserving true read-level properties while providing a controlled ground-truth set for benchmarking.
We are pleased to share our new preprint introducing MosaicSim, a framework for generating realistic mosaic variants! Mosaic variants - mutations present in only a subset of cells - are crucial for development, disease, and cancer, but are notoriously hard to call.
www.biorxiv.org/content/10.6...
A fun lab outing to the zoo ahead of conference season! π¦
So since we only include >0.1% MAF variants in this article we can't address ultrarare, but check out Supp Fig 3; when comparing ancestry-specific AFs many variants deviate from the 1:1 line. We plotted this on the logββ(AF) scale to help magnify the low-frequency range.
To limit the noise from ultra-rare alleles we only looked at variants β₯0.1% MAF. Totally appreciate that's still quite low frequency, but even with that filter, we still saw the noted ancestry-specific frequency differences.
Great point; we thought about that too! Pragati stratified by whether variants were monomorphic or not to capture at least that aspect, but youβre right that the impact depends on where a variant sits on the SFS. Rare ones can show big fold-changes but small absolute shifts.
Texas Children's/Baylor College of Medicine Researchers Create Groundbreaking Tool to Improve Accuracy of #GeneticTesting @egatkinson.bsky.social @bcmgenetics.bsky.social @bcmhouston.bsky.social #TCHResearchNews #TexasChildrens @natcomms.nature.com tinyurl.com/jj6kyrrv
Thrilled to share our new @natcomms.nature.com paper on local ancestry informed allele frequencies in gnomAD, which are live now on the browser! Check out my stellar PhD student @pragskore.bsky.socialβs Bluetorial on how this brings finer detail to variant interpretation π§¬π₯οΈ
A project many years in the process, weβre pleased to present our work on multi-ancestry meta-analysis across a boatload of traits in the UK Biobank: www.nature.com/articles/s41...
Delighted to amplify my talented PhD studentβs work! Check it out for a great way to streamline and harmonize Tractor analyses.
Thanks for the interest! The tutorial code is available to download as supplemental information of the paper, and has been deposited as a community workspace in the All of Us Researcher Workbench.
In summary, we present a replicable training model that empowers early-career researchers - including and especially those new to computational genomics - to responsibly leverage large-scale biobank data into their research programs and teaching.
From years 1β3, training outcomes reported by scholars to stem directly from this training included:
π 17 conference presentations
π¬ Multiple funded research grants
π Numerous genomics modules added in undergrad courses
π€ Sustained collaborations across institutions
During the summit, scholars used real short-read WGS data to:
β’ Prepare phenotypes & covariates
β’ Run GWAS via Hail
β’ Visualize results with PCA, Manhattan & QQ plots
β’ Manage compute costs
All in ~4 hours with no prior coding required.
Our training was part of the All of Us Biomedical Researcher Scholars Program through @bcmgenetics.bsky.social focused on mentoring early-stage faculty in genomic data science. The curriculum launches with an intensive Faculty Summit, where scholars get hands-on experience working with genomic data.
Access to big genomic data is growing, but parallel access to skills needed to use it hasnβt kept up.
We created an accessible, cloud-based genomic analysis training bootcamp using real All of Us data, Jupyter notebooks, and the Hail framework to lower the barrier for early-career researchers.
π¨ New perspective piece in @ajhgnews.bsky.social! π¨
We developed a hands-on training resource for large-scale genomic data analysis in the All of Us Researcher Workbench, now published here:
Tractor-Mix builds on Tractorβs strengths to detect ancestry-enriched signals while adding power and robust false-positive control for relatedness via a GRM. By modeling both admixture and relatedness, it overcomes key GWAS barriers and enables more accurate, representative genomic discovery.
Tractor-Mix uses ancestry-specific genotypes as predictors, outputting ancestry-specific effect sizes and P values. We benchmark our new tool in simulations and apply it to multiple admixed cohorts (including UKBiobank and Mexico City Prospective Study), uncovering signals missed by standard GWAS.
In this work, we introduce Tractor-Mix, a new GWAS method that extends Tractor to handle related admixed samples. It combines a mixed model framework (like GMMAT) with local ancestry-aware genotypes (like Tractor) in a 2 d.o.f. test.
As biobanks and global cohorts grow, so does the inclusion of admixed individuals with close or cryptic relatedness. This introduces the statistical challenge of two interwoven sources of stratification: admixture and relatedness, which are rarely handled together.
We previously developed Tractor, a local ancestry-aware GWAS method thatβs been widely used to uncover ancestry-enriched signals and refine genetic architecture in admixed populations. But Tractor (being a GLM) only works on unrelated samples, limiting its use in many real-world datasets.
We're excited to introduce Tractor-Mix, our new method for GWAS in admixed cohorts with relatedness, led by the fantastic @doubletaotan.bsky.social! Read the full preprint here: www.medrxiv.org/content/10.1...
Thanks to all our amazing collaborators who helped make this work possible!
Check out my stellar PhD student, Pragati's talk on our work generating local ancestry informed frequency estimates in gnomAD as part of the prestigious Emerging Genomic Scientist Symposium next week! Congrats on being selected for this amazing event!
I'm delighted to be part of this symposium, put on by University of Pennsylvania Perelman School of Medicine, and led by @bpasaniuc.bsky.social and @sarahtishkoff.bsky.social. See you in a few weeks! upenn.co1.qualtrics.com/jfe/form/SV_...
π Huge thanks to all our amazing LAGC collaborators! Special shoutout to Estela Bruxel and Diego Rovaris for leading this crucial work, and of course @janitzamontalvo.bsky.social and @giustilab.bsky.social for co-founding the LAGC and co-leading alongside myself. πͺ