Sarah Gurev's Avatar

Sarah Gurev

@sarahgurev

FutureHouse Fellowship - Sergey Ovchinnikov and Aaron Schmidt labs! Previously MIT EECS PhD with Debora Marks. Talk to me about modeling viral/immune proteins! 🦠

143
Followers
241
Following
18
Posts
29.09.2023
Joined
Posts Following

Latest posts by Sarah Gurev @sarahgurev

Most benchmark papers are boring but this is a masterpiece. Authors present a comprehensive breakdown of different inference strategies on variant effect prediction (VEP) in viruses. There were two surprising results that IMO deserve more attention🧡

22.01.2026 15:45 πŸ‘ 16 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0
This line graph illustrates the percentage change in agency staff levels from the previous year for nine major U.S. federal scientific and health organizations between the fiscal years 2016 and 2025. The agencies tracked include the CDC, Department of Energy, EPA, FDA, NASA, NIH, NIST, NOAA, and NSF. For the majority of the timeline between 2016 and 2023, the agencies show relatively stable fluctuations, generally staying within a range of +5% to -5% change per year. However, there is a dramatic and uniform plummet starting in the 2024–25 period. Every agency depicted shows a sharp downward trajectory, with staffing losses ranging from approximately -15% to over -25%. The Environmental Protection Agency (EPA) shows the most significant decline, dropping to roughly -26%, while the National Institute of Standards and Technology (NIST) shows the least severe but still substantial drop at approximately -15%.

This line graph illustrates the percentage change in agency staff levels from the previous year for nine major U.S. federal scientific and health organizations between the fiscal years 2016 and 2025. The agencies tracked include the CDC, Department of Energy, EPA, FDA, NASA, NIH, NIST, NOAA, and NSF. For the majority of the timeline between 2016 and 2023, the agencies show relatively stable fluctuations, generally staying within a range of +5% to -5% change per year. However, there is a dramatic and uniform plummet starting in the 2024–25 period. Every agency depicted shows a sharp downward trajectory, with staffing losses ranging from approximately -15% to over -25%. The Environmental Protection Agency (EPA) shows the most significant decline, dropping to roughly -26%, while the National Institute of Standards and Technology (NIST) shows the least severe but still substantial drop at approximately -15%.

This is the most astonishing graph of what the Trump regime has done to US science. They have destroyed the federal science workforce across the board. The negative impacts on Americans will be felt for generations, and the US might never be the same again.

www.nature.com/immersive/d4...

20.01.2026 22:53 πŸ‘ 14449 πŸ” 8317 πŸ’¬ 90 πŸ“Œ 765

This was fun work and a remarkable effort across the computational and wet-lab teams!

Strategies for in-silico filtering and ranking of antibody designs have been under-discussed in the literature, e.g. in most technical reports on antibody design that I've seen. Let's talk about these here! [1/n]

15.01.2026 11:19 πŸ‘ 18 πŸ” 8 πŸ’¬ 1 πŸ“Œ 0
Post image

This complements our prior evaluation of deep mutational scans showcasing PLM underperformance, even when there is especially low sequence diversity.

There is still much room for improvement for viral modeling!

13.01.2026 19:06 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Computational variant effect predictors effectively predict viral evolution – even more so than deep mutational scans (DMS).

Yet, PLM or hybrid approaches (even with data-leakage inflating performance) provide little benefit over the best alignment-based model (EVE).

13.01.2026 19:06 πŸ‘ 2 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

πŸ¦ πŸ“ˆ We ask whether models can predict mutations that actually rose in frequency in natureβ€”providing the clearest available evaluation of their utility for pandemic preparedness.

13.01.2026 19:06 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

We now evaluate protein language models (PLMs), alignment-based models, and hybrids across 30 clades from 4 well-sequenced viruses (SARS-CoV-2, HIV, H3N2, and H1N1) - the first large scale evaluation of these models on real viral evolution forecasting.

13.01.2026 19:06 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

We’ve updated the EVEREST benchmark to include real-world viral evolution! www.biorxiv.org/content/10.1...

Co-led by Noor Youssef and me, along with co-authors Navami Jain, Aarushi Mehrotra, Sarrah Leung, Abigail Jackson, @deboramarks.bsky.social, and with @cepi.net @futurehousesf.bsky.social!

13.01.2026 19:06 πŸ‘ 5 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Preview
Fresh conflicts erupt around giant database for flu and COVID-19 sequences Critics say β€œautocratic” behavior by GISAID could hamper response to a future pandemic

The fundamental problem with GISAID is this - data on the platform are neither open, nor FAIR.

Such data - key for combating infectious diseases - are, by a large margin, paid for by taxpayers and hence must be openly available to all.

With GISAID, they are not.

www.science.org/content/arti...

07.01.2026 17:23 πŸ‘ 21 πŸ” 8 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

New preprint! We measured temperature- and pH-induced aggregation for over 18,000 natural and de novo designed protein domains!

19.11.2025 21:16 πŸ‘ 121 πŸ” 42 πŸ’¬ 4 πŸ“Œ 3
Video thumbnail

End-to-end protein design in the browser through evedesign. Generate and interactively explore designs in 2D/3D and export them as codon-optimized DNA. The underlying open source framework (released soon) is build to easily add new methods, more on that soon.
🌐 evedesign.bio

22.10.2025 14:30 πŸ‘ 93 πŸ” 29 πŸ’¬ 2 πŸ“Œ 1
Video thumbnail

Thrilled to announce our new preprint, β€œProtein Hunter: Exploiting Structure Hallucination within Diffusion for Protein Design,” in collaboration with @Griffin, @GBhardwaj8 and @sokrypton.org

🧬Code and notebooks will be released by the end of this week.
🎧Golden- Kpop Demon Hunters

13.10.2025 15:45 πŸ‘ 52 πŸ” 16 πŸ’¬ 3 πŸ“Œ 2
Post image

Large AI models are reported to achieve high accuracy (AUROC) predicting pathogenic variants across the genome.

A preprint reports that the predictions are based on splice variants. Using only this info (no sequences, no AI) achieves AUROC=0.944 across noncoding variants.

1/2

09.09.2025 23:01 πŸ‘ 15 πŸ” 7 πŸ’¬ 1 πŸ“Œ 0
Preview
Near real-time data on the human neutralizing antibody landscape to influenza virus to inform vaccine-strain selection in September 2025 The hemagglutinin of human influenza virus evolves rapidly to erode neutralizing antibody immunity. Twice per year, new vaccine strains are selected with the goal of providing maximum protection again...

In new study led by @ckikawa.bsky.social, we provide near real-time data on human neutralizing antibody landscape to influenza by measuring ~26,000 titers to >100 recent viral strains

Data can inform vaccine selection & evolutionary/epidemiological modeling
www.biorxiv.org/content/10.1...

08.09.2025 21:48 πŸ‘ 60 πŸ” 35 πŸ’¬ 1 πŸ“Œ 1
Preview
Recent advances in the inference of deep viral evolutionary history | Journal of Virology Phylogenetic studies examining the origins, emergence, and spread of viruses have arguably been one of the most active and successful areas of evolutionary biology and form the bedrock of the flourishing field of genomic epidemiology. This, in part, reflects the ability of viruses, particularly those with RNA genomes, to evolve at rates much greater than their cellular counterparts (1). The rapid rate at which viruses evolve and accumulate mutations enables evolutionary signals to be identified through comparative genomics at short timescales relevant for outbreak investigation and response. The integration of phylogenetics and epidemiology, known as phylodynamics, has become a vital tool in response to numerous viral outbreaks, epidemics, and pandemics, including Ebola (2), Zika (3), and, more recently, COVID-19 (4) and mpox (5).

There’s been a bunch of new approaches looking at deep viral evolutionary history. We’ve put together a mini review highlighting some recent advancements in structural phylogenetics and time-dependent rate models and what they could do for the field 🦠
πŸ”— journals.asm.org/doi/full/10....

25.08.2025 20:32 πŸ‘ 28 πŸ” 13 πŸ’¬ 2 πŸ“Œ 2
Preview
Divergent viral phosphodiesterases for immune signaling evasion Cyclic dinucleotides (CDNs) and other short oligonucleotides play fundamental roles in immune system activation in organisms ranging from bacteria to humans. In response, viruses use phosphodiesterase...

Excited to share our new preprint co-led by @jnoms.bsky.social!

Here we reveal an exceptional diversity of viral 2H phosphodiesterases (PDEs) that enable immune evasion by selectively degrading oligonucleotide-based messengers. This 2H PDE fold has evolved striking substrate breath & specificity.

22.08.2025 19:02 πŸ‘ 43 πŸ” 28 πŸ’¬ 2 πŸ“Œ 3

Thanks!

20.08.2025 20:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Variant effect prediction with reliability estimation across priority viruses Viruses pose a significant threat to global health due to their rapid evolution, adaptability, and increasing potential for cross-species transmission. While advances in machine learning and the growi...

🦠The future of pathogen forecasting needs rigorous benchmarks and domain-specific modeling, not only bigger PLMs. EVEREST is a step in that direction.

πŸ”—Paper: biorxiv.org/content/10.1...
πŸ’»Code + data: github.com/debbiemarksl...
12/12

17.08.2025 03:42 πŸ‘ 6 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

πŸ™Amazing collaboration co-led with Noor Youssef
and Navami Jain, @deboramarks.bsky.social, and our funders @cepi.net!
11/12

17.08.2025 03:42 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

This matters for:
⚠️ Future-proof vaccine and therapeutics design
⚠️ Monitoring of high-pandemic risk viruses
⚠️ Dual-use biosecurity risk assessment

Without reliable models, we risk underestimating viral evolutionβ€”and overestimating our ability to counter it.
10/12

17.08.2025 03:42 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

EVEREST highlights:
βœ… Where models failβ€”and why
βœ… Which viruses are least/most predictable
βœ… How to estimate per-protein, model-specific reliability
βœ… Concrete steps to improve ML for viral mutation prediction
9/12

17.08.2025 03:42 πŸ‘ 6 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0
Post image

🌍Current models fail to reliably predict mutations in more than half of the high-priority viruses identified by the WHO.
8/12

17.08.2025 03:42 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ’ͺIs bigger always better? Maybe not for other taxa but for viruses - yes! For viruses, models continue to improve with increased numbers of parameters.
7/12

17.08.2025 03:42 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

🀏Why? Viruses are severely underrepresented in training datasets (<1%) and are further downsampled after common clustering approaches.
6/12

17.08.2025 03:42 πŸ‘ 8 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ“‰Despite the hype, protein language models trained across the β€œprotein universe” are outperformed by even the simplest, site-independent alignment-based model.
5/12

17.08.2025 03:42 πŸ‘ 14 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0

πŸ’­Imagine: It’s Day 0 of an outbreak and there’s little experiment data. Computational mutational effect predictions could provide valuable information…if we could trust them. Can we?

EVEREST doesn’t just assess performance. It also quantifies reliability for new viruses.
4/12

17.08.2025 03:42 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

πŸš€To find out, we built EVEREST: Evolutionary Variant Effect prediction with Reliability ESTimation.

We benchmark models across 45 viral deep mutational scanning datasets spanning >340,000 mutations.
3/12

17.08.2025 03:42 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

🦠 Protein language models (PLMs) have shown impressive performance in predicting mutation effects. But... viruses are a different beast.

They evolve fast, cross species, and are under pressure from host immunity. Do PLMs still work here?
2/12

17.08.2025 03:42 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

🚨New paper 🚨

Can protein language models help us fight viral outbreaks? Not yet. Here’s why πŸ§΅πŸ‘‡
1/12

17.08.2025 03:42 πŸ‘ 56 πŸ” 23 πŸ’¬ 3 πŸ“Œ 2
Preview
Protein Structure Informed Bacteriophage Genome Annotation with Phold Bacteriophage (phage) genome annotation is essential for understanding their functional potential and suitability for use as therapeutic agents. Here we introduce Phold, an annotation framework utilis...

Stoked to finally have a preprint out for Phold, our tool that uses protein structural information to enhance phage genome annotation #phagesky 1/n

www.biorxiv.org/content/10.1...

08.08.2025 07:10 πŸ‘ 137 πŸ” 66 πŸ’¬ 5 πŸ“Œ 4