Note:Â The voice features referenced in this article are being released gradually and may not yet be available to all users. General availability to all users will be released soon.
This feature measures the usage, engagement rate, and abandonment rate of voice chats. The quality of responses is measured offline through testing the audio and text outputs of the model on relevance, search, and hallucination criteria across different languages, environments, and accents.Transparency Note for Microsoft 365 Copilot to learn more.
 For more information, please visitÂYou can speak naturally while in voice chats and have the ability to interrupt audio responses by simply speaking. Copilot will stop speaking and listen to you when interrupted and provide new responses based on new voice input.
For the best response quality, you’re encouraged to reduce background noise during active voice chats.
New audio-specific safety mechanisms are integrated into the real-time GPT-4o model used for a voice chat, as outlined in OpenAI’s official documentation. There are also additional checks for intellectual property, copyright, jailbreaks, and harmful content to be consistent with Microsoft’s AI Safety practices.
For more information, please review the Microsoft Responsible AI Transparency Report and our commitment to ISO/IEC 42001:2023 Artificial Intelligence Management System Standards.
The model is trained on a diverse range of user voices to ensure consistent performance across different accents and speech patterns. To evaluate this, OpenAI tested the model using a fixed assistant voice and simulated user inputs generated by the Voice Engine. These inputs came from two sources: Official system voices and 27 diverse English voice samples representing various countries and genders. The model was evaluated on both capabilities (knowledge and common-sense tasks) and safety behavior. Results showed that performance on diverse human voices was only marginally lower—but not significantly—compared to system voices, indicating strong generalization across English accents.
Voice chat is currently only available in selected languages. For a complete list, see Supported languages for Microsoft 365 Copilot.
Voice chat leverages web grounding to augment responses with current events and information, improving factual accuracy and usefulness.
Not all capabilities supported in text are available while in a voice chat, such as creating images, files, or at-mentioning agents.
Voice chat in Microsoft 365 Copilot doesn’t use user data to train the model.
Generative AI features strive to provide accurate and informative responses, based on the data available. However, answers may not always be accurate as they are generated based on patterns and probabilities in language data. Use your own judgment and double check the facts before making decisions or taking action based on the responses.
While these features have mitigations in place to avoid sharing unexpected offensive content in results and take steps to prevent displaying potentially harmful topics, you may still see unexpected results. We're constantly working to improve our technology to proactively address issues in line with our responsible AI principles.
Copilot includes filters to block offensive language in the prompts and to avoid synthesizing suggestions in sensitive contexts. We continue to work on improving the filter system to more intelligently detect and remove offensive outputs. If you see offensive outputs, please submit feedback by using the thumbs-up/thumbs-down icons so that we can improve our safeguards. Microsoft takes this challenge very seriously, and we are committed to addressing it.
Microsoft 365 Copilot is built on the Microsoft comprehensive approach to security, compliance, and privacy.
If you’re using Microsoft 365 Copilot in your organization (with your work or school account), see Data, Privacy, and Security for Microsoft 365 Copilot.
Once a voice chat ends, use the thumbs-up and thumbs-down buttons to provide feedback or suggestions for improvement.