Soham Deshmukh

I am a Senior Applied Scientist on the Microsoft Speech team. Previously, I recieved my PhD from Carnegie Mellon University and B.Tech from VJTI.

My broad research interests include Audio/Speech Processing and Multimodal Learning. My research gets deployed in products like Teams, Edge, Outlook. Some recent works include: Video Translation, Pengi, CLAP

Academic service:

[2024] Organized workshop on Speech and Audio Language Models (SALMA) at ICASSP 2025
[2023] Organized special session at ICASSP 2023
[Reviewer] ICASSP, INTERSPEECH, NeurIPS, ICLR, DCASE, TASLP

Links: Google Scholar • GitHub • Twitter • LinkedIn • CV