ProteinTTT is now easy to run on Hugging Face Spaces and Google Colab. Weβll also be presenting the paper at ICLR 2026 π§π·
π€ Hugging Face Space: huggingface.co/spaces/pimen...
βοΈ Google Colab: colab.research.google.com/drive/1l_h7c...
π§΅π
05.03.2026 12:08
π 39
π 9
π¬ 3
π 0
Joining the All-The-Bacteria project provides an annualised return on investment of 8.9% bacterial genomes.
27.02.2026 07:29
π 17
π 3
π¬ 0
π 1
Overview β AllTheBacteria documentation
Courtesy of @martibartfast.bsky.social , we have a new release of AllTheBacteria which adds another 322,920 assemblies, covering all ENA (illumina, isolate) prokaryotes to May 2025.
allthebacteria.readthedocs.io/en/latest/ov...
26.02.2026 15:48
π 60
π 28
π¬ 0
π 3
@wytamma.bsky.social 's WASM tools have transformed my experience of teaching Python to first year undergraduate biologists this year. Since last year, I've been teaching ~400 undergrads how to code (functions, lists, dictionaries, loops) over (one hour intro lecture +) two 2-hour practicals. 1/n
25.02.2026 22:33
π 60
π 17
π¬ 3
π 1
iVoMs β ISVM
The @isvm-society.bsky.social is organising monthly #virus of #microbes (including #phage). Nominate a speaker and register for the upcoming events at isvm.org/ivom/
17.02.2026 20:24
π 12
π 6
π¬ 0
π 0
Super happy that the AllTheBacteria hypothetical proteins are now in AFDB - hopefully we can start to understand the function of some of them at least π
18.02.2026 04:46
π 39
π 7
π¬ 0
π 0
AlphaFold Database welcomes community datasets
Latest AlphaFold Database update adds high-value datasets for microbial and viral proteins, generated by specialist communities
Delighted to see over 17 million new protein structure predictions from novel proteins in AllTheBacteria are now integrated into the AlphaFold Database at @ebi.embl.org !
Huge work from @gbouras13.bsky.social @oschwengers.bsky.social and friends to generate these.
www.ebi.ac.uk/about/news/u...
17.02.2026 13:52
π 97
π 26
π¬ 1
π 2
Inferring context-specific site variation with evotuned protein language models
Abstract. Multiple sequence alignments (MSAs) have been traditionally used for making inferences about site-specific diversity in proteins. Recent advancem
Our paper on inferring context dependent entropy using protein language models is officially out in NAR Genomics & Bioinformatics! π§¬π€
with Adam Strange, Jumpei Ito, and @systemsvirology.bsky.social
academic.oup.com/nargab/artic...
details below...
#NARGAB
14.02.2026 06:01
π 23
π 9
π¬ 1
π 0
How do bacterial pangenomes evolve, what controls their dynamics, why do they exist?
Fitting a mechanistic model to 450 species from allthebacteria.org suggesting fast vs slow gene exchange (i.e. amount of MGEs) is a major differentiating factor, correlated with phylogeny rather than lifestyle
09.02.2026 10:55
π 71
π 32
π¬ 1
π 0
Addressing pandemic-wide systematic errors in the SARS-CoV-2 phylogeny - Nature Methods
This Resource paper presents a global SARS-CoV-2 phylogenetic tree of 4,471,579 high-quality genomes consistently constructed by Viridian, an efficient amplicon-aware assembler.
A long time ago in a galaxy far away, there was a SARS-CoV-2 pandemic. Our paper, led by @martibartfast.bsky.social
a) correcting errors in 4.5 million genomes & their phylogeny
b) improving representation of the Global South in public data
www.nature.com/articles/s41...
(thread 1/n)
09.02.2026 15:16
π 137
π 66
π¬ 3
π 6
Introducing The Structural History of Eukarya (SHE): The first proteome-scale phylogeny constructed entirely from 3D structure.
We computed 300 trillion alignments across 1,542 species to map the tree of life. π§΅π (1/5)
07.02.2026 08:50
π 84
π 40
π¬ 2
π 0
Hugging Face β The AI community building the future.
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
Super excited to announce the release of gene and intergenic region annotation from the largest bacterial genome and MAG datasets available, including AllTheBacteria, GTDB, SPIRE, HRGM, mOTUs and MGnify - dereplicated and available from HuggingFace huggingface.co/AllTheBacteria
05.02.2026 13:27
π 16
π 13
π¬ 2
π 0
ONT read QC strategies for assembly
a blog for miscellaneous bioinformatics stuff
New blog post: ONT read QC strategies for assembly
rrwick.github.io/2026/02/05/r...
Mini-study comparing a few QC/subsampling approaches, plus practical notes from my experience.
05.02.2026 03:38
π 34
π 14
π¬ 0
π 0
Release v1.12 - Just do it, but don't crash Β· oschwengers/bakta
This is the twelfth minor release (v1.12) providing more than 10 minor improvements and many bug fixes improving runtime stability, IO compatibility, and last but not least user experience.
Compati...
π¦ π§¬π₯οΈ Bakta v1.12.0 is out
with tons of tiny improvements and bug fixes, too many to list all:
- partial genes on linear seqs
- improved errror handlings & runtimes
- support Python 3.12 & 3.13
- ...
A huge shout out and thank you to all bug reporters and contributors!
github.com/oschwengers/...
02.02.2026 08:05
π 9
π 7
π¬ 0
π 1
Multiple protein structure alignment at scale with FoldMason
Protein structure is conserved beyond sequence, making multiple structural alignment (MSTA) essential for analyzing distantly related proteins. Computational prediction methods have vastly extended ou...
FoldMason is out now in @science.org. It generates accurate multiple structure alignments for thousands of protein structures in seconds. Great work by Cameron L. M. Gilchrist and @milot.bsky.social.
π www.science.org/doi/10.1126/...
π search.foldseek.com/foldmason
πΎ github.com/steineggerla...
30.01.2026 06:11
π 300
π 147
π¬ 4
π 3
Release slow5tools-v1.4.0 Β· hasindu2008/slow5tools
What's Changed
slow5tools skim supports the new auxiliary field open_pore_level introduced in latest ONT pod5
slow5tools degrade has new profiles added (by @sashajenner and @hiruna72) and are docu...
slow5tools v1.4.0 released github.com/hasindu2008/...
Many bit profiles for ex-zd lossy compression added by @hiruna72, who reduced 275TB of historical @nanopore rawdata at @GenTechGp to 172TB.
guide to lossy archive: hasindu2008.github.io/slow5tools/a...
paper: doi.org/10.1101/gr.2...
28.01.2026 23:46
π 8
π 5
π¬ 1
π 0
GitHub - bluenote-1577/savont: Amplicon sequencing variants from 16s ONT R10.4 / HiFi long reads
Amplicon sequencing variants from 16s ONT R10.4 / HiFi long reads - bluenote-1577/savont
Announcing a new tool for "denoising" long-read amplicon sequences: savont.
Savont enables amplicon sequence variants (ASVs) directly from nanopore (or HiFi) long reads. Tested on 16S nanopore amplicons -- seems to work okay.
1/4
github.com/bluenote-157...
28.01.2026 18:45
π 51
π 28
π¬ 1
π 2
AmpliPhy improves gene trees by adding homologs without affecting alignments
In phylogenomics, gene tree reconstruction depends on multiple sequence alignment (MSA) and tree inference, and ongoing work continues to improve inference quality. Denser taxon sampling has been associated with improved gene tree inference, suggesting that adding homologs could be a practical route to higher accuracy as sequence databases continue to expand. However, adding sequences can influence multiple steps of typical inference pipelines, and little is known on its specific effect on the multiple sequence alignment, tree reconstruction, and rooting steps. We performed a large-scale empirical benchmark to quantify how homolog enrichment affects alignment and phylogenetic inference. Using an enrichment-impoverishment design and a measure of tree accuracy based on taxonomic congruence, we found that enrichment consistently improves tree inference quality, while effects on alignment quality are marginal. We show that this improvement is associated with accurate root placement on enriched trees when sensitive homolog search is accompanied. Notably, much of the benefit can be retained with relatively compact alignments produced by sequence addition. Building on these observations, we provide a tool, AmpliPhy, which efficiently improves phylogenetic reconstruction of protein families through homolog enrichment. The AmpliPhy open-source pipeline software is available at https://github.com/DessimozLab/ampliphy. ### Competing Interest Statement The authors have declared no competing interest. Swiss National Science Foundation, https://ror.org/00yjd3n13, 216623, 10005715
Can ever-increasing sequence databases improve phylogenetic reconstruction of a gene family? Our new preprint introduces AmpliPhy, a pipeline that automates homolog enrichment to improve gene tree inference, built on a robust phylogenomic benchmark scheme. π§΅1/n
π doi.org/10.64898/2026.01.26.701724
28.01.2026 06:10
π 25
π 14
π¬ 1
π 0
Registration form for iVoM4
After submitting this form, you will receive the instructions to join our webinars at the email address you provide.
New 2026 iVoM series coming up!
Each session includes SCR and 3 ECRs, & plenty of opportunities to interact with the speakers and ask questions.
Sign up for links/updates: docs.google.com/forms/d/1hAB...
First up: Viral Biotechnologies Wed, 28 th January at 17:00 CET / 11:00 EST / 08:00 PST
22.01.2026 02:42
π 8
π 4
π¬ 0
π 1
P2 Solo announcement and the trade-offs of a more stable ONT
a blog for miscellaneous bioinformatics stuff
New blog post with some thoughts on @nanoporetech.com and their recent announcement that the P2 Solo will be discontinued:
rrwick.github.io/2026/01/21/p...
21.01.2026 03:38
π 21
π 14
π¬ 0
π 0
Mirdita Lab - Laboratory for Computational Biology & Molecular Machine Learning
Mirdita Lab builds scalable bioinformatics methods.
My time in @martinsteinegger.bsky.social's group is ending, but Iβm staying in Korea to build a lab at Sungkyunkwan University School of Medicine. If you or someone you know is interested in molecular machine learning and open-source bioinformatics, please reach out. I am hiring!
mirdita.org
20.01.2026 11:07
π 104
π 55
π¬ 7
π 1
GitHub - ebiggers/libdeflate: Heavily optimized library for DEFLATE/zlib/gzip compression and decompression
Heavily optimized library for DEFLATE/zlib/gzip compression and decompression - ebiggers/libdeflate
ποΈβ‘ If you use gzip/gunzip a lot in your pipelines, switch to the faster"libdeflate" versions instead! They use modern CPU capabilities to achieve a 2-3x speedup.
libdeflate is in conda, and "libdeflate-gzip" and "libdeflate-gunzip" are drop-in replacements. #unix
github.com/ebiggers/lib...
20.01.2026 01:37
π 71
π 23
π¬ 1
π 0
HLi Lab - Vacancies
Openings
Heng Li's lab is looking for a postdoc, "algorithms for sequence alignment, pangenome representation, application of pangenome data structures, or other projects". C/C++/rust proficiency needed.
hlilab.github.io/vacancies
π»π§¬
15.01.2026 04:49
π 4
π 2
π¬ 0
π 0
Phold's manuscript is now available @narjournal.bsky.social thanks to @susiegriggo.bsky.social @npbhavya.bsky.social @vijinim.bsky.social @linsalrob.bsky.social @martinsteinegger.bsky.social @milot.bsky.social @eunbelivable.bsky.social & others not on bsky #phagesky academic.oup.com/nar/article/...
14.01.2026 05:10
π 82
π 44
π¬ 1
π 1
Releasing alignism, a small tool that I have found useful for doing multiple sequence alignment in browser.
hgbrian.github.io/alignism/
- The hard work was done by the awesome biowasm team!
- Does tree building too
- V fast compared to e.g., muscle on EBI
- Not tested that much!
30.12.2025 17:16
π 24
π 10
π¬ 0
π 0
Release Next time I'll try to be FASTA Β· tseemann/any2fasta
New features
Option -k is keep processing even when some inputs fail
option -g to include GBK version suffix
option s to strip desc from>id desc in ID lines
Support for PDB protein structure forma...
πΎ any2fasta 0.8.1 is released!
The FASTA format is now 40 years old (Pearson & Lipman) and any2fasta makes it easy for your scripts and pipelines that accept FASTA to also accept other formats, even if compressed! eg. .gbk.gz
#bioinformatiocs #microbiology #genomcs
github.com/tseemann/any...
30.12.2025 05:16
π 45
π 15
π¬ 1
π 0