Shinji Watanabe's Avatar

Shinji Watanabe

@shinjiw

I'm working at CMU (2021-). I was working at NTT (2001-2011), MERL (2012-2017), and JHU (2017-2020). Speech and Audio Processing is my main research topic.

387
Followers
61
Following
5
Posts
22.11.2024
Joined
Posts Following

Latest posts by Shinji Watanabe @shinjiw

Post image

Can self-supervised models ๐Ÿค– understand allophony ๐Ÿ—ฃ? Excited to share my new #NAACL2025 paper: Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment arxiv.org/abs/2502.07029 (1/n)

29.04.2025 17:00 ๐Ÿ‘ 15 ๐Ÿ” 10 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0
Video thumbnail

๐Ÿ“ข Introducing VERSA: our new open-source toolkit for speech & audio evaluation!

- 80+ metrics in one unified interface
- Flexible input support
- Distributed evaluation with Slurm
- ESPnet compatible

Check out the details
wavlab.org/activities/2...
github.com/wavlab-speec...

28.04.2025 19:50 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

New #NAACL2025 demo, Excited to introduce ESPnet-SDS, a new open-source toolkit for building unified web interfaces for both cascaded & end-to-end spoken dialogue system, providing real-time evaluation, and more!
๐Ÿ“œ: arxiv.org/abs/2503.08533
Live Demo: huggingface.co/spaces/Siddh...

17.03.2025 14:29 ๐Ÿ‘ 7 ๐Ÿ” 5 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐Ÿš€ New #ICLR2025 Paper Alert! ๐Ÿš€

Can Audio Foundation Models like Moshi and GPT-4o truly engage in natural conversations? ๐Ÿ—ฃ๏ธ๐Ÿ”Š

We benchmark their turn-taking abilities and uncover major gaps in conversational AI. ๐Ÿงต๐Ÿ‘‡

๐Ÿ“œ: arxiv.org/abs/2503.01174

05.03.2025 16:03 ๐Ÿ‘ 9 ๐Ÿ” 6 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐Ÿ“ฃ #SpeechTech & #SpeechScience people

We are organizing a special session at #Interspeech2025 on: Interpretability in Audio & Speech Technology

Check out the special session website: sites.google.com/view/intersp...

Paper submission deadline ๐Ÿ“† 12 February 2025

06.12.2024 21:29 ๐Ÿ‘ 16 ๐Ÿ” 9 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1

Excited to announce the launch of our ML-SUPERB 2.0 challenge @interspeech.bsky.social 2025! Join us in pushing the boundaries of multilingual ASR and LID! ๐Ÿš€

๐Ÿ’ป multilingual.superbbenchmark.org

04.12.2024 18:09 ๐Ÿ‘ 8 ๐Ÿ” 3 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image Post image Post image Post image

We are excited to announce the launch of ML SUPERB 2.0 (multilingual.superbbenchmark.org) as part of the Interspeech 2024 official challenge! We hope this upgraded version of ML SUPERB advances universal access to speech processing worldwide. Please join it!

#Interspeech2025

04.12.2024 14:45 ๐Ÿ‘ 20 ๐Ÿ” 9 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1
Post image Post image

This is my first official post at Bluesky with great news :)

We got the best paper award at IEEE SLT'24! This work elegantly and straightforwardly solves contextual biasing issues with dynamic vocabulary arxiv.org/abs/2405.13344. Congrats, Yui, Yosuke, Shakeel, and Yifan!
! I'm super happy!

04.12.2024 14:15 ๐Ÿ‘ 40 ๐Ÿ” 7 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 1

Part II

@pzelasko.bsky.social
@smfsamir.bsky.social
@juice500ml.bsky.social
@popcornell.bsky.social
@wanchichen.bsky.social
@holgerbovbjerg.bsky.social
@cromz22.bsky.social
@siddhant-arora.bsky.social
@mmiagshatoy.bsky.social

26.11.2024 20:54 ๐Ÿ‘ 5 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

I just collected them (maybe some of them are already there)

Part I
@pengyf.bsky.social
@brianyan918.bsky.social
@albertzeyer.bsky.social
@oplatek.bsky.social
@shikharb.bsky.social
@zaidsheikh.bsky.social
@kalvinchang.bsky.social

26.11.2024 20:54 ๐Ÿ‘ 7 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Multimodal Information Based Speech Processing (MISP) 2025 Challenge

Hi speech people, super exciting news here!

We are running another "Multimodal information based speech (MISP)" Challenge at @interspeech.bsky.social

Participate!
Spread the word!

More info ๐Ÿ‘‡
mispchallenge.github.io/mispchalleng...

25.11.2024 11:25 ๐Ÿ‘ 15 ๐Ÿ” 7 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0