Had a great time at the #ML4NGP Training school in Sevilla! Back to Italy tomorrow (normally it would be back to London but things change!)
P.S. Signed my TT contract with @unipd.bsky.social , starting March 23rd. Exciting!
Had a great time at the #ML4NGP Training school in Sevilla! Back to Italy tomorrow (normally it would be back to London but things change!)
P.S. Signed my TT contract with @unipd.bsky.social , starting March 23rd. Exciting!
To advance the family-based modelling approach, we are releasing the entire framework open source:
ProFam Atlas: A curated, large-scale training corpus containing nearly 40 million protein families.
Code & Weights: github.com/alex-hh/prof...
Data: zenodo.org/records/1771...
For design, ProFam-1 excels at homology-guided generation. It produces diverse sequences with low sequence identity to natural proteins while preserving predicted structural similarity and conservation patterns of the natural family, even when conditioning on just a single example sequence.
Built by CATH, TΓM and NVIDIA, ProFam-1 is our new open-source protein family language model (pfLM) designed to generate functional protein variants and predict fitness using in-context example sequences.
Iβll continue working on algorithms, deep learning, and AI-based methods to explore the protein structural and functional landscape.
Starting in early 2026!
Excited to return after 10 years around Europe!
Going full circle! π
The University of Padua was home for both my Bachelorβs and Masterβs degrees.
After 7 amazing years at UCL in the Orengo Group, Iβm really happy to share that Iβve won a Tenure-Track Assistant Professorship in Biochemistry at my Alma Mater!
From Sameer Velankar & colleagues in @narjournal.bsky.social #NARDatabaseIssue | #AlphaFold #Protein #Structure #Database 2025: a redesigned interface and updated structural coverage | #Bioinformatics #Proteomics #OpenScience #AFDB π§ͺπ CC/ @ebi.embl.org
β¬οΈ
academic.oup.com/nar/advance-...
We are looking for a computational postdoc to work with us on new optimisation algorithms to make #RELION even better. Join our bubbly team at the @mrclmb.bsky.social in Cambridge, UK. π€ RTs appreciated.
mrc.tal.net/vx/appcentre...
It was lovely to speak at the CATH 30 symposium, celebrating 30 years of the @cathgene3d.bsky.social protein structure classification database. I was presenting recent work on our new generative protein-family language model: preprint coming soon.
Packing for our first flight with our kid tomorrow. Wish us luck!
We went from 9kg of checked luggage for 2 months in Thailand to 3 checked suitcases and a pram. Send help!
We have a stellar lineup of speakers!
Christine Orengo
Burkhard Rost
Janet Thornton
David Jones
Gonzalo Parra @gonzaparra.bsky.social
Sameer Velankar
Alex Bateman
Maria Martin
Rob Finn
Gerardo Tauriello
Alexey Murzin
There will be talks from world leaders in structural bioinfomatics on various themes including pioneering protein language models and key international resources including: PDBe, InterPro, UniProt, MGnify, SWISS-MODEL, FrustraEvo and CATH.
CATH turns 30 years old this year!
We are organising a 1-day symposium on September 16th at UCL, highlighting recent AI-based developments to enhance protein family classifications, annotations and analyses.
www.eventbrite.co.uk/e/protein-an...
Thank you David! Officially a guiri!
Today I became a British citizen! π¬π§
#ISMBECCB2025 is over! Back to London tomorrow after a science feast, a talk, and a selfie with John Jumper. Not too bad!
Today at 2 PM at 3DSIG #ISMBECCB2025, @nbordin.bsky.social presents our joint work on metagenomic-scale clustering and novel domain discovery in predicted structures!
π www.biorxiv.org/content/10.1...
Also check out poster:
B-50 lolalign Sensitive structural alignments by Lasse
B-123 BFVD by Rachel
Off to Liverpool for #ISMBECCB2025!
Looking forward to some awesome science and friends!
Just reverted the video to explain protein folding!
"in 2025 we will have flying cars" πππ
We've updated our AFESM website to now include biome filtering, allowing exploration of protein structures adapted to specific environments.
π afesm.foldseek.com
Read more about the work in the skeetorial
π¦ bsky.app/profile/mart...
or our preprint
π www.biorxiv.org/content/10.1...
Pinging @jingiyeo.bsky.social and @martinsteinegger.bsky.social
Very good point! It might worth investigating. We noticed this behaviour also when we clustered TED (over 81M singletons). that analysis was done at the domain-level, not at the chain level but the clustering wasn't that strict. Here I focussed more on the downstream from the domain end of things.
Amazing effort by @jingiyeo.bsky.social, @yewonhan.bsky.social, Andy Lau, @shaunkandathil.bsky.social, @hbkgenomics.bsky.social, Eli Levy Karin and @cathgene3d.bsky.social !
Explore AFESM with our website! You can search your favorite proteins from ESMatlas or AFDB using their identifiers. It's still a work in progress, with many exciting features on the way! Thanks @milot.bsky.social !
However, these novel domain combinations comprise only a small fraction (0.3%) of ESM-only clusters. The remainder are mostly low-quality predictions (53%), fragments (16%), known domains with potential unknown extensions (19%), or without identifiable domains (9.3%).
@yewonhan.bsky.social identified 11,941 novel multi-domain combinations!
We found membrane-associated domains (e.g., TonB dependent receptor), highlighting domain recombination rather than new folds as a driver of structural innovation in ESMatlas.
ESM-only clusters contain ZERO novel folds using the TED workflow. Re-modelling discarded domains (2.3M) with ColabFold revealed 1 novel fold; unlike AFDBβs >7k novel folds, hinting at a saturating fold space or ESMfold limitations.
With MGnify environmental labels, we computed the lowest common biomes per structural cluster, revealing protein adaptations unique to specific environments, especially extreme ones like hyperthermal, hypersaline, and glaciers.