How vision in Microsoft 365 Copilot works

Vision in Microsoft 365 Copilot lets you share your desktop screen or mobile camera and ask Copilot about what you’re seeing, with answers grounded in both the shared content and your work data. To start using it, see Get started with vision in Microsoft 365 Copilot.

How does vision work?

Vision turns what you share into information Copilot can reason over, then combines it with your work data to answer you out loud.

You share visual input during a voice conversation: your desktop screen (on Windows or the web) or your mobile camera.
Copilot converts the shared screen or camera content into data it can analyze, including text, images, charts, and on-screen interfaces.
Copilot grounds its response by combining what it sees with your Microsoft 365 work data, such as documents, emails, and earlier discussions.
Copilot answers by voice in real time, so you can ask follow-up questions in the same conversation.

What can vision do?

Here are some of the most useful things you can do with vision:

Analyze and explain what you’re looking at. Ask Copilot to interpret charts, tables, slides, or dense text. For example, “Summarize this dashboard and highlight any anomalies.”
Turn feedback into next steps. Share a page of a document, email, or chat and ask, “What feedback did I receive on this, and what are my next steps?”
Get a project status update. Share a tracker or work item and ask, “Give me a quick status of this item, the current owner, and the next milestone.”
Get in-context, step-by-step help. Point your phone camera at a device error and ask, “How do I fix this error?”

What are the limitations of vision?

Vision is only available through voice chats with Copilot right now.
Daily use is based on available capacity and counts toward your daily voice usage. You’ll be notified in the Copilot app as you approach your available usage.
Vision can’t read videos or animated GIFs.
Switching between windows too quickly while asking a question may cause Copilot to respond based on the wrong screen content.
Vision can’t take action or directly manipulate items on your screen.
Vision has no long-term recall. It doesn’t reuse screens or camera input from previous sessions.

Note

These are the capabilities at launch. Ongoing improvements may address some of these limitations over time.

What does vision have access to?

During an active session, Copilot processes the screen or camera content you choose to share, along with your Microsoft 365 work data to ground its answers, such as documents, emails, meetings, and prior discussions.

Visual input is user-initiated and session-bound, so Copilot only works with content shared during the active session.
Shared screen or camera content is processed as a series of images. Audio and video data are temporarily stored for you to provide feedback to Microsoft and is deleted after 48 hours.
Text transcripts from voice conversations are stored and managed the same way as text conversations in the Microsoft 365 Copilot app.
Vision adheres to Microsoft’s enterprise-grade commitments to data security and privacy, and it doesn’t infer sensitive personal attributes such as race or emotion.

For more information, see Data, Privacy, and Security for Microsoft 365 Copilot.

What types of content are supported?

Vision can understand a range of on-screen and real-world content, including:

Text, images, charts, tables, and dashboards.
App interfaces and multi-window workflows on desktop.
Physical objects viewed through your mobile camera.

Which languages are supported?

Vision is available in all languages supported by Microsoft 365 Copilot. Some languages may be more prone to pronunciation or recognition differences. For the complete list, see Supported languages for Microsoft 365 Copilot.

How were vision conversations evaluated? What metrics are used to measure performance?

To ensure quality, Copilot is given test questions, and its responses are evaluated based on criteria such as relevance, correctness, completeness, tone, and instruction adherence. To better measure the performance, we also review feature usage, thumb-up rate, engagement rate, and user feedback. We also conduct a wide range of safety evaluations that ensure vision responds in a responsible manner to harmful queries.

How do I provide feedback about vision in Copilot?

You can provide feedback once you end a voice chat. You should see a thumbs up/down appear in your chat history which is shared with Microsoft for feature improvements. We don't use this feedback to train the foundation models used by Copilot.