We have a master internship position with possible PhD extension at @labo-loria.bsky.social
The work focuses on on speech enhancement with distributed microphone array.
www.linkedin.com/jobs/view/43...
@ilyassmoummad
Postdoctoral Researcher @ Inria Montpellier (IROKO, Pl@ntNet) SSL for plant images Interested in Computer Vision, Natural Language Processing, Machine Listening, and Biodiversity Monitoring Website: ilyassmoummad.github.io
We have a master internship position with possible PhD extension at @labo-loria.bsky.social
The work focuses on on speech enhancement with distributed microphone array.
www.linkedin.com/jobs/view/43...
*Darkthrone played in the background* 😂
bonus point (the feature extractor is available on HF): huggingface.co/DBD-research...
Check out the amazing work by my collaborators Lukas and René, who trained a ViT-based masked autoencoder to learn patterns in mel-spectrograms of bird vocalizations without annotations, and then cleverly aggregated the learned features to solve downstream bird species classification tasks. 🐦
[10/10] Wrap-up 🎯
🔹 Unified supervised + unsupervised hashing
🔹 Flexible: works via probing or LoRA
🔹 SOTA hashing in minutes on a single GPU
📄 Paper: arxiv.org/abs/2510.27584
💻 Code: github.com/ilyassmoumma...
Shoutout to my wonderful co-authors Kawtar, Hervé, and Alexis.
[9/10] Strong generalization 🌍
CroVCA produces compact codes that transfer efficiently:
✅ Single HashCoder trained on ImageNet-1k works on downstream datasets without retraining (More experiments and ablations in the paper)
[8/10] Semantically consistent retrieval 🔍
CroVCA retrieves correct classes even for fine-grained or ambiguous queries (e.g., indigo bird, grey langur).
✅ Outperforms Hashing-Baseline
✅ Works with only 16 bits and without supervision
[7/10] Compact yet meaningful codes 💾
Even with just 16 bits, CroVCA preserves class structure.
t-SNE on CIFAR-10 shows clear, separable clusters — almost identical to the original 768-dim embeddings.
[6/10] Strong performance across encoders 💪
Tested on multiple vision encoders (SimDINOv2, DINOv2, DFN…), CroVCA achieves SOTA unsupervised hashing:
[5/10] Fast convergence 🚀
CroVCA trains in just ~5 epochs:
✅ COCO (unsupervised) <2 min
✅ ImageNet100 (supervised) ~3 min
✅ Single GPU
Despite simplicity, it achieves state-of-the-art retrieval performance.
[4/10] HashCoder 🛠️
A lightweight MLP with final BatchNorm for balanced bits (inspired by OrthoHash). Can be used as:
🔹 Probe on frozen features
🔹 LoRA-based fine-tuning for efficient encoder adaptation
[3/10] Unifying hashing 🔄
Can supervised + unsupervised hashing be done in one framework?
CroVCA aligns binary codes across semantically consistent views:
Augmentations → unsupervised
Class-consistent samples → supervised
🧩 One BCE loss + coding-rate regularizer
[2/10] The challenge ⚡
Foundation models (DINOv3, DFN, SWAG…) produce rich embeddings, but similarity search in high-dimensional spaces is expensive.
Hashing provides fast Hamming-distance search, yet most deep hashing methods are complex, slow, and tied to a single paradigm.
[1/10] Introducing CroVCA ✨
A simple, unified framework for supervised and unsupervised hashing that converts foundation model embeddings into compact binary codes.
✅ Preserves semantic structure
✅ Trains in just a few iterations
BioDCASE workshop - registration closes next week Oct 10th https://biodcase.github.io/workshop2025/ - Hope to see you there! #bioacoustics
I heard that the Linux client is buggy, I use it on the browser and it's working ok.
for the curious, the code, slides and the article are on Github: github.com/BastienPasde...
love it haha wish I were there to hear Prostitute Disfigurement in an amphitheater
A website to visually browse and explore the ImageNet-1k dataset (there are other supported datasets: IN-12M, WikiMedia, ETH Images, Pixabay, Fashion) navigu.net#imagenet
(Maybe this is already known, but I was happy to discover it this morning)
Im interested in the quantum and footnotesize, how much params should they have 😂
Learning Deep Representations of Data Distributions
Sam Buchanan · Druv Pai · Peng Wang · Yi Ma
ma-lab-berkeley.github.io/deep-represe...
The best Deep Learning book is out, I've been waiting for its release for more than a year. Let's learn how to build intelligent systems via compression.
It feels like we can now fit more noise with more model capacity 🤔 (Figure 6), maybe we need newer architectures and/or newer training losses.
1/ Can open-data models beat DINOv2? Today we release Franca, a fully open-sourced vision foundation model. Franca with ViT-G backbone matches (and often beats) proprietary models like SigLIPv2, CLIP, DINOv2 on various benchmarks setting a new standard for open-source research.
👋 I worked on bioacoustics during my PhD, but I post mostly about AI
🏹 Job alert: Research Scientist at Prior Labs
📍Freiburg or Berlin 🇩🇪
📅 Apply by Dec 31 - preferably earlier
🔗 More info: https://bit.ly/4kqn5rY
Congratz! 👏
my new addiction today: youtu.be/dSyJqwN36ow
I can't wait to see them this summer in Motocultor Festival
the best discovery I've had in recent years, I'm addicted to it now as well 😁
Thank you for making this accessible to everyone! I've read some sections, it is very instructive.
Our computer vision textbook is now available for free online here:
visionbook.mit.edu
We are working on adding some interactive components like search and (beta) integration with LLMs.
Hope this is useful and feel free to submit Github issues to help us improve the text!