Soham Deshmukh

I am a Research Scientist at Sesame AI . My work at Sesame focuses on developing human-like conversational agents, which includes real-time audio understanding, response and audio generation.

My broad research interests include Audio-Language and Multimodal Learning. Before joining Sesame, I spent five years at Microsoft Speech team as a Senior Applied Scientist, where my research was deployed in products like Teams, Edge, Outlook. I recieved my PhD from Carnegie Mellon University, advised by Bhiksha Raj. My PhD thesis was Learning Audio Foundation Models for Reasoning , and introduced the first set of audio-language models and reasoning for audio.

Academic service:

Links: Google ScholarGitHubTwitterLinkedInCV

Soham Deshmukh