Enjoyed my first @interspeech.bsky.social conference. Seems like a great community. Well organized and great venue. This is how big conferences could look like. Take notes, ICASSP!
@faroit
AudioML research scientist at https://audioshake.ai, before: post-doc @inria@social.numerique.gouv.fr, Editor at https://bsky.app/profile/joss-openjournals.bsky.social All in 17.68% of grey, located in Frankfurt (Germany)
Enjoyed my first @interspeech.bsky.social conference. Seems like a great community. Well organized and great venue. This is how big conferences could look like. Take notes, ICASSP!
Now in Rotterdam at @interspeech.bsky.social with @cifkao.bsky.social and @hschreiber.bsky.social
Same here. With Claude 4, pandas becomes usable again but every time I tried torch models, all shapes are messed up. What I like though is that the AI agent often comes up with little clever helper bash scripts to test stuff (because it doesn’t understand the code base 😁)
Harvard Business on Open Source: When PyTorch left Meta for its own non-profit, "this shift led to a significant decrease in contributions from Meta but a notable increase from external companies...participation increased from complementors (Chip Manufacturers);" papers.ssrn.com/sol3/papers....
🚀 We’re looking for a Master’s student to join our research team for a 6-month internship at AudioShake!
Deep dive into PyTorch, optimize our SOTA audio models, and help make ML sound better (and faster) 🎶
Based in Paris or remote 🇫🇷 → audioshake.notion.site/Internship-M... #AudioML #Internship
Would I ever want to have the reviews written by LLMs? Hell, no!
I think they serve well as a guide of how to do reviews well. “Have you checked x?” I often find actual flaws that I would have missed otherwise. You don’t have to understand a paper to find flaws. Just think of: “we did x to improve y” - without backing it up by a citation.
Why? Isn’t the main point to identify flaws? I often found an LLM finds 10 flaws and only 1-2 of them are valid concerns. So yes this is dangerous if just used without human in the loop. But also often I find ideas what to check in detail based on the initial LLM summary.
Just wonder whether the reviewer demographics are something specific to your field. I review about 10-20 papers per year, I don’t get payed by the public and looking at our main conferences like ICASSP it looks like (no numbers) at least half of the reviewers do have an industry position.
🚀 New #ICLR2025 Paper Alert! 🚀
Can Audio Foundation Models like Moshi and GPT-4o truly engage in natural conversations? 🗣️🔊
We benchmark their turn-taking abilities and uncover major gaps in conversational AI. 🧵👇
📜: arxiv.org/abs/2503.01174
true. But it thought IEEE owns the idea of paying much more and getting much less than at other conferences :-)
@interspeech.bsky.social new to the speech community coming from ISMIR/ICASSP/Eusipco/DAFX. How come Interspeech is that much more expensive than other conferences? This makes it very hard for many researchers to get approval!
@fakufaku.bsky.social can I do this with pyroomacoustics? Or do you know a simpler idea?
Not knowing much about spatial audio: how do people render multiple dry mono sources to a wet reverberated stereo image where each source has a fixed position in space? I guess one could use ambisonics RiRs to create stereo images? But whats the easier way to handle the positioning?
AudioShake’s Multi-Speaker Separation is the first-ever hi-res solution for isolating overlapping voices. Perfect for media pros, transcription, & AI voice workflows. 🔗www.audioshake.ai/post/introducing-multi-speaker-separation-from-audioshake
How stem separation tech brought the legendary voice of Maria Callas back to life in “Maria". 🎶 Isolating Callas’s original vocals allowed @warnerclassics.bsky.social and filmmakers to control and blend her voice with Jolie’s performance. 🔗 Read: www.audioshake.ai/post/audiosh...
We just released the Helium-1 model , a 2B multi-lingual LLM which @exgrv.bsky.social and @lmazare.bsky.social have been crafting for us! Best model so far under 2.17B params on multi-lingual benchmarks 🇬🇧🇮🇹🇪🇸🇵🇹🇫🇷🇩🇪
On HF, under CC-BY licence: huggingface.co/kyutai/heliu...
Our article, "Diffusion Models for Audio Restoration: A Review," is now published in the IEEE Signal Processing Magazine!
A huge thank you to all co-authors Jean-Marie Lemercier, Julius Richter, Simon Welker, Eloi Moliner, and Vesa Välimäki for a great collaboration.
doi.org/10.1109/MSP....
Today, we’re introducing NatureLM-audio: the first large audio-language model tailored for understanding animal sounds. arxiv.org/abs/2411.07186 🧵👇
Where is AGI that charges all my devices and batteries?
Since this is a new platform and mHuBERT-147 just reached 86k downloads, let me make some promotion!
This year we released a compact powerful multilingual SSL model. Trained on balanced, high-quality, open-license data, this model rivals MMS-1B but is 10x smaller.
huggingface.co/utter-projec...
Looking for reviewers before Christmas
interspeech2025.org challenge URGENT Organizers: Kohei Saijo, Wangyou Zhang, Samuele Cornell, Robin Scheibler, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Yihui Fu, Wei Wang, Tim Fingscheidt, Shinji Watanabe
🌟 URGENT Challenge @ #Interspeech2025 🌟
Join the Universal, Robust, & Generalizable Speech EnhancemeNT (URGENT) challenge! Explore noisy corpora, tackle diverse speech degradations, and test scalability across 2 tracks (~2.5k/60k hrs).
🚀 Learn more: urgent-challenge.github.io/urgent2025/
new paper! 🗣️Sketch2Sound💥
Sketch2Sound can create sounds from sonic imitations (i.e., a vocal imitation or a reference sound) via interpretable, time-varying control signals.
paper: arxiv.org/abs/2412.08550
web: hugofloresgarcia.art/sketch2sound
📢 Audio AI Job opportunity at Adobe!
The Sound Design AI Group (SODA) is looking for an exceptional research engineer to join us in building the future of AI-assisted audio and video creation.
Strong ML background, GenAI experience a plus.
Details: adobe.wd5.myworkdayjobs.com/external_exp...
🚨🚨My team @GoogleDeepMind in Tokyo is looking for a talented research scientist to work on audio generative models! 🔊
Please consider applying if you have expertise in the domain or related areas such as multimodal models, video generation 📹, etc.
boards.greenhouse.io/deepmind/job...
€700M and not even generative? Doesn’t seem like a good investment.
www.theguardian.com/world/2024/n...
🎓Academia or the industry 💸? I wrote a detailed point of view on Twitter a few months ago, so maybe I should share it here again. I think that most things are still true, the only slight change would be linked to the GenAI bubble, but only time will tell.
www.darnault-parcollet.fr/documents/Ba...
“My AI startups is just a GPT Wrapper”
The Reality for AI Startups
mit Dir ist alles blauer 🥰