Microsoft AI (MAI) has unveiled two new in-house models designed to accelerate its mission of building responsible, human-centered artificial intelligence. The launches mark a major step toward creating applied AI platforms that are reliable, expressive, and capable of supporting diverse user needs.
The first release, MAI-Voice-1, is a highly expressive speech generation model delivering natural, high-fidelity audio across single- and multi-speaker use cases. Now available in Copilot Daily, Podcasts, and a new Copilot Labs experience, the model can generate a full minute of audio in under a second on a single GPU—making it one of the fastest speech systems available. Users can experiment with features like interactive storytelling and guided meditations to experience its expressive capabilities firsthand.
The second release, MAI-1-preview, represents MAI’s first internally developed foundation model trained end-to-end on a mixture-of-experts architecture. Pre- and post-trained on approximately 15,000 NVIDIA H100 GPUs, the model is now undergoing public evaluation on LMArena, where the community can test its instruction-following and conversational abilities. Select use cases within Copilot will also begin rolling out in the coming weeks, with additional access available for trusted testers via API.
Also Read: Reality Defender Appoints Brian Levin as Chief Revenue Officer
“Voice is the interface of the future for AI companions, and MAI-Voice-1 delivers high-fidelity, expressive audio across both single and multi-speaker scenarios,” said Stefan Wahle, Vice President and General Manager, Wolters Kluwer Tax & Accounting, Europe Region North.
Looking ahead, Microsoft AI plans to expand its portfolio of specialized models tailored to different intents and use cases, combining proprietary technology with partner and open-source innovations. This multi-model strategy is aimed at delivering trusted, adaptive AI experiences to millions of users worldwide.