George Bouras (@gbouras13)

ProteinTTT is now easy to run on Hugging Face Spaces and Google Colab. We’ll also be presenting the paper at ICLR 2026 🇧🇷
🤗 Hugging Face Space: huggingface.co/spaces/pimen...
⚙️ Google Colab: colab.research.google.com/drive/1l_h7c...
🧵👇

05.03.2026 12:08 👍 39 🔁 9 💬 3 📌 0

Novel transposon Tn8026 acts as a global driver of transmissible linezolid resistance in Enterococcus via a linear plasmid Linezolid is a critical last-resort antimicrobial for multidrug-resistant Enterococcus faecium , particularly against vancomycin-resistant lineages where therapeutic options are severely limited. Whil...

Until joining @loolibear.bsky.social's lab in July, I embarrassingly hadn't had much experience with plasmids.
So when I started, Leah said "here you go, have a look at this dataset".
What a fun ride this has been.
Preprint out today and thread below
www.medrxiv.org/content/10.6...

05.03.2026 05:21 👍 8 🔁 3 💬 1 📌 1

Joining the All-The-Bacteria project provides an annualised return on investment of 8.9% bacterial genomes.

27.02.2026 07:29 👍 17 🔁 3 💬 0 📌 1

Overview — AllTheBacteria documentation

Courtesy of @martibartfast.bsky.social , we have a new release of AllTheBacteria which adds another 322,920 assemblies, covering all ENA (illumina, isolate) prokaryotes to May 2025.
allthebacteria.readthedocs.io/en/latest/ov...

26.02.2026 15:48 👍 60 🔁 28 💬 0 📌 3

@wytamma.bsky.social 's WASM tools have transformed my experience of teaching Python to first year undergraduate biologists this year. Since last year, I've been teaching ~400 undergrads how to code (functions, lists, dictionaries, loops) over (one hour intro lecture +) two 2-hour practicals. 1/n

25.02.2026 22:33 👍 60 🔁 17 💬 3 📌 1

iVoMs – ISVM

The @isvm-society.bsky.social is organising monthly #virus of #microbes (including #phage). Nominate a speaker and register for the upcoming events at isvm.org/ivom/

17.02.2026 20:24 👍 12 🔁 6 💬 0 📌 0

Super happy that the AllTheBacteria hypothetical proteins are now in AFDB - hopefully we can start to understand the function of some of them at least 😁

18.02.2026 04:46 👍 39 🔁 7 💬 0 📌 0

AlphaFold Database welcomes community datasets Latest AlphaFold Database update adds high-value datasets for microbial and viral proteins, generated by specialist communities

Delighted to see over 17 million new protein structure predictions from novel proteins in AllTheBacteria are now integrated into the AlphaFold Database at @ebi.embl.org !
Huge work from @gbouras13.bsky.social @oschwengers.bsky.social and friends to generate these.

www.ebi.ac.uk/about/news/u...

17.02.2026 13:52 👍 97 🔁 26 💬 1 📌 2

Cheesymite scroll - Wikipedia

en.wikipedia.org/wiki/Cheesym...

14.02.2026 10:54 👍 3 🔁 0 💬 1 📌 0

Inferring context-specific site variation with evotuned protein language models Abstract. Multiple sequence alignments (MSAs) have been traditionally used for making inferences about site-specific diversity in proteins. Recent advancem

Our paper on inferring context dependent entropy using protein language models is officially out in NAR Genomics & Bioinformatics! 🧬🤖

with Adam Strange, Jumpei Ito, and @systemsvirology.bsky.social

academic.oup.com/nargab/artic...

details below...

#NARGAB

14.02.2026 06:01 👍 23 🔁 9 💬 1 📌 0

How do bacterial pangenomes evolve, what controls their dynamics, why do they exist?
Fitting a mechanistic model to 450 species from allthebacteria.org suggesting fast vs slow gene exchange (i.e. amount of MGEs) is a major differentiating factor, correlated with phylogeny rather than lifestyle

09.02.2026 10:55 👍 71 🔁 32 💬 1 📌 0

Addressing pandemic-wide systematic errors in the SARS-CoV-2 phylogeny - Nature Methods This Resource paper presents a global SARS-CoV-2 phylogenetic tree of 4,471,579 high-quality genomes consistently constructed by Viridian, an efficient amplicon-aware assembler.

A long time ago in a galaxy far away, there was a SARS-CoV-2 pandemic. Our paper, led by @martibartfast.bsky.social
a) correcting errors in 4.5 million genomes & their phylogeny
b) improving representation of the Global South in public data
www.nature.com/articles/s41...
(thread 1/n)

09.02.2026 15:16 👍 137 🔁 66 💬 3 📌 6

Introducing The Structural History of Eukarya (SHE): The first proteome-scale phylogeny constructed entirely from 3D structure.
We computed 300 trillion alignments across 1,542 species to map the tree of life. 🧵👇 (1/5)

07.02.2026 08:50 👍 84 🔁 40 💬 2 📌 0

Hugging Face – The AI community building the future. We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Super excited to announce the release of gene and intergenic region annotation from the largest bacterial genome and MAG datasets available, including AllTheBacteria, GTDB, SPIRE, HRGM, mOTUs and MGnify - dereplicated and available from HuggingFace huggingface.co/AllTheBacteria

05.02.2026 13:27 👍 16 🔁 13 💬 2 📌 0

ONT read QC strategies for assembly a blog for miscellaneous bioinformatics stuff

New blog post: ONT read QC strategies for assembly
rrwick.github.io/2026/02/05/r...

Mini-study comparing a few QC/subsampling approaches, plus practical notes from my experience.

05.02.2026 03:38 👍 34 🔁 14 💬 0 📌 0

Release v1.12 - Just do it, but don't crash · oschwengers/bakta This is the twelfth minor release (v1.12) providing more than 10 minor improvements and many bug fixes improving runtime stability, IO compatibility, and last but not least user experience. Compati...

🦠🧬🖥️ Bakta v1.12.0 is out

with tons of tiny improvements and bug fixes, too many to list all:

- partial genes on linear seqs
- improved errror handlings & runtimes
- support Python 3.12 & 3.13
- ...

A huge shout out and thank you to all bug reporters and contributors!

github.com/oschwengers/...

02.02.2026 08:05 👍 9 🔁 7 💬 0 📌 1

A comprehensive catalogue of receptor-binding domains in extracellular contractile injection systems - Nature Communications Extracellular contractile injection systems (eCISs) are bacteriophage tail-derived toxin delivery complexes that are present in many prokaryotes. Here, the authors present an analysis of eCIS tail fib...

A new paper from the lab on virus-like particles called eCISs www.nature.com/articles/s41...

How bacteria evolved thousands of precision nanoinjectors?

Some bacteria don’t secrete toxins — they inject them using phage-derived machines called extracellular contractile injection systems (eCISs).

26.01.2026 13:26 👍 41 🔁 27 💬 3 📌 0

Multiple protein structure alignment at scale with FoldMason Protein structure is conserved beyond sequence, making multiple structural alignment (MSTA) essential for analyzing distantly related proteins. Computational prediction methods have vastly extended ou...

FoldMason is out now in @science.org. It generates accurate multiple structure alignments for thousands of protein structures in seconds. Great work by Cameron L. M. Gilchrist and @milot.bsky.social.
📄 www.science.org/doi/10.1126/...
🌐 search.foldseek.com/foldmason
💾 github.com/steineggerla...

30.01.2026 06:11 👍 300 🔁 147 💬 4 📌 3

Release slow5tools-v1.4.0 · hasindu2008/slow5tools What's Changed slow5tools skim supports the new auxiliary field open_pore_level introduced in latest ONT pod5 slow5tools degrade has new profiles added (by @sashajenner and @hiruna72) and are docu...

slow5tools v1.4.0 released github.com/hasindu2008/...

Many bit profiles for ex-zd lossy compression added by @hiruna72, who reduced 275TB of historical @nanopore rawdata at @GenTechGp to 172TB.

guide to lossy archive: hasindu2008.github.io/slow5tools/a...
paper: doi.org/10.1101/gr.2...

28.01.2026 23:46 👍 8 🔁 5 💬 1 📌 0

GitHub - bluenote-1577/savont: Amplicon sequencing variants from 16s ONT R10.4 / HiFi long reads Amplicon sequencing variants from 16s ONT R10.4 / HiFi long reads - bluenote-1577/savont

Announcing a new tool for "denoising" long-read amplicon sequences: savont.

Savont enables amplicon sequence variants (ASVs) directly from nanopore (or HiFi) long reads. Tested on 16S nanopore amplicons -- seems to work okay.

1/4

github.com/bluenote-157...

28.01.2026 18:45 👍 51 🔁 28 💬 1 📌 2

AmpliPhy improves gene trees by adding homologs without affecting alignments In phylogenomics, gene tree reconstruction depends on multiple sequence alignment (MSA) and tree inference, and ongoing work continues to improve inference quality. Denser taxon sampling has been associated with improved gene tree inference, suggesting that adding homologs could be a practical route to higher accuracy as sequence databases continue to expand. However, adding sequences can influence multiple steps of typical inference pipelines, and little is known on its specific effect on the multiple sequence alignment, tree reconstruction, and rooting steps. We performed a large-scale empirical benchmark to quantify how homolog enrichment affects alignment and phylogenetic inference. Using an enrichment-impoverishment design and a measure of tree accuracy based on taxonomic congruence, we found that enrichment consistently improves tree inference quality, while effects on alignment quality are marginal. We show that this improvement is associated with accurate root placement on enriched trees when sensitive homolog search is accompanied. Notably, much of the benefit can be retained with relatively compact alignments produced by sequence addition. Building on these observations, we provide a tool, AmpliPhy, which efficiently improves phylogenetic reconstruction of protein families through homolog enrichment. The AmpliPhy open-source pipeline software is available at https://github.com/DessimozLab/ampliphy. ### Competing Interest Statement The authors have declared no competing interest. Swiss National Science Foundation, https://ror.org/00yjd3n13, 216623, 10005715

Can ever-increasing sequence databases improve phylogenetic reconstruction of a gene family? Our new preprint introduces AmpliPhy, a pipeline that automates homolog enrichment to improve gene tree inference, built on a robust phylogenomic benchmark scheme. 🧵1/n
📃 doi.org/10.64898/2026.01.26.701724

28.01.2026 06:10 👍 25 🔁 14 💬 1 📌 0

Registration form for iVoM4 After submitting this form, you will receive the instructions to join our webinars at the email address you provide.

New 2026 iVoM series coming up!

Each session includes SCR and 3 ECRs, & plenty of opportunities to interact with the speakers and ask questions.

Sign up for links/updates: docs.google.com/forms/d/1hAB...

First up: Viral Biotechnologies Wed, 28 th January at 17:00 CET / 11:00 EST / 08:00 PST

22.01.2026 02:42 👍 8 🔁 4 💬 0 📌 1

P2 Solo announcement and the trade-offs of a more stable ONT a blog for miscellaneous bioinformatics stuff

New blog post with some thoughts on @nanoporetech.com and their recent announcement that the P2 Solo will be discontinued:
rrwick.github.io/2026/01/21/p...

21.01.2026 03:38 👍 21 🔁 14 💬 0 📌 0

Mirdita Lab - Laboratory for Computational Biology & Molecular Machine Learning Mirdita Lab builds scalable bioinformatics methods.

My time in @martinsteinegger.bsky.social's group is ending, but I’m staying in Korea to build a lab at Sungkyunkwan University School of Medicine. If you or someone you know is interested in molecular machine learning and open-source bioinformatics, please reach out. I am hiring!
mirdita.org

20.01.2026 11:07 👍 104 🔁 55 💬 7 📌 1

GitHub - ebiggers/libdeflate: Heavily optimized library for DEFLATE/zlib/gzip compression and decompression Heavily optimized library for DEFLATE/zlib/gzip compression and decompression - ebiggers/libdeflate

🗜️⚡ If you use gzip/gunzip a lot in your pipelines, switch to the faster"libdeflate" versions instead! They use modern CPU capabilities to achieve a 2-3x speedup.

libdeflate is in conda, and "libdeflate-gzip" and "libdeflate-gunzip" are drop-in replacements. #unix

github.com/ebiggers/lib...

20.01.2026 01:37 👍 71 🔁 23 💬 1 📌 0

HLi Lab - Vacancies Openings

Heng Li's lab is looking for a postdoc, "algorithms for sequence alignment, pangenome representation, application of pangenome data structures, or other projects". C/C++/rust proficiency needed.

hlilab.github.io/vacancies

💻🧬

15.01.2026 04:49 👍 4 🔁 2 💬 0 📌 0

Phold's manuscript is now available @narjournal.bsky.social thanks to @susiegriggo.bsky.social @npbhavya.bsky.social @vijinim.bsky.social @linsalrob.bsky.social @martinsteinegger.bsky.social @milot.bsky.social @eunbelivable.bsky.social & others not on bsky #phagesky academic.oup.com/nar/article/...

14.01.2026 05:10 👍 82 🔁 44 💬 1 📌 1

Releasing alignism, a small tool that I have found useful for doing multiple sequence alignment in browser.

hgbrian.github.io/alignism/

- The hard work was done by the awesome biowasm team!
- Does tree building too
- V fast compared to e.g., muscle on EBI
- Not tested that much!

30.12.2025 17:16 👍 24 🔁 10 💬 0 📌 0

Release Go the distance even faster · tseemann/snp-dists Speed major fix to CPU parallelisation - far better scaling -x maxdiff option to short-circuit distant pairs -L option to only calculate the lower triangle of the matrix (2x faster) Features -t ...

💾 snp-dists 1.2.0

Major upgrade to this distance matrix tool - huge performance gains from multithreading (-j), max distance short circuiting (-x) and lower triangle output (-L).

#bioinformatiocs #microbiology #genomics
github.com/tseemann/snp...

31.12.2025 01:58 👍 24 🔁 7 💬 2 📌 1

Release Next time I'll try to be FASTA · tseemann/any2fasta New features Option -k is keep processing even when some inputs fail option -g to include GBK version suffix option s to strip desc from>id desc in ID lines Support for PDB protein structure forma...

💾 any2fasta 0.8.1 is released!

The FASTA format is now 40 years old (Pearson & Lipman) and any2fasta makes it easy for your scripts and pipelines that accept FASTA to also accept other formats, even if compressed! eg. .gbk.gz

#bioinformatiocs #microbiology #genomcs
github.com/tseemann/any...

30.12.2025 05:16 👍 45 🔁 15 💬 1 📌 0

George Bouras

Latest posts by George Bouras @gbouras13