I am a Research Scientist at Sesame AI . My work at Sesame focuses on developing human-like conversational agents, which includes real-time audio understanding, response and audio generation.
My broad research interests include Audio-Language and Multimodal Learning. Before joining Sesame, I spent five years at Microsoft Speech team as a Senior Applied Scientist, where my research was deployed in products like Teams, Edge, Outlook. I recieved my PhD from Carnegie Mellon University, advised by Bhiksha Raj. My PhD thesis was Learning Audio Foundation Models for Reasoning , and introduced the first set of audio-language models and reasoning for audio.
Academic service:
Links: Google Scholar • GitHub • Twitter • LinkedIn • CV