Soham Deshmukh

I am a Research Scientist at Sesame AI . My work at Sesame focuses on developing human-like conversational agents, which includes real-time audio understanding, response and audio generation.

My broad research interests include Audio-Language and Multimodal Learning. Before joining Sesame, I spent five years at Microsoft Speech team as a Senior Applied Scientist, where my research was deployed in products like Teams, Edge, Outlook. I recieved my PhD from Carnegie Mellon University, advised by Bhiksha Raj. My PhD thesis was Learning Audio Foundation Models for Reasoning , and introduced the first set of audio-language models and reasoning for audio.

Academic service:

[2024] Organized workshop on Speech and Audio Language Models (SALMA) at ICASSP 2025
[2023] Organized special session at ICASSP 2023
[Reviewer] ICASSP, INTERSPEECH, NeurIPS, ICLR, DCASE, TASLP

Links: Google Scholar • GitHub • Twitter • LinkedIn • CV