Google Gemini citing sources in its response and also aiding users in dounle-checking its response.

Google Gemini | G2: Make clear how well the system can do what it can do

Google Gemini | G2: Make clear how well the system can do what it can do

An example of a question to an AI and its response conveying uncertainty, starting with the expression 'I’m not sure, but…' (Kim et al., 2024; Appendix Figure D2, p.7)

Linguistic uncertainty expressions | 2A: Match the level of precision in UI communication with the system performance – Language

The interface for displaying feedback to participants. Statement: Equality of opportunity is an ideal that cannot be realizes with governmental actions. The 2010 Equality Bill in Britain ended up being repealed. Feedback from the AI logical assessment system: If one bill in Britain did not lead to equality of opportunity, does it follow that equality of opportunity cannot be realized with other other government actions?

AI-framed questioning | G2: Make clear how well the system can do what it can do

Highlighting tokens with low generation probabilities (left) vs. highlighting tokens with high edit probabilities (right) for three coding tasks (Vasconcelos et al., 2023; p. 8)

Visual uncertainty highlighting | G2: Make clear how well the system can do what it can do

Highlighting tokens with low generation probabilities in the AI outputs. Some tokens in the AI output are highlighted in red.

Uncertainty highlighting | G2: Make clear how well the system can do what it can do

A QA model may make a prediction based on insufficient context (left), making it difficult for users to assess its correctness. Providing the necessary background information (right) might reduce the user’s over-reliance.

Background explanations | G2: Make clear how well the system can do what it can do

Maze solving task: Is there a valid path from start to exit? AI explanation visually shows the path from start to exit. Open-domain QA: What is the capital of Lebanon? AI explanation According to Wikipedia, Beirut is the capital and largest city of Lebanon.

Verification-focused explanations | 11A: Local explanations

An example claim and the corresponding ChatGPT explanation, retrieved passages (abridged), and contrastive explanation. The claim is true and the refuting explanation has factual errors and reasoning contradiction. Claim: Only one spacecraft has visited the planet and Neptune has more than 13 moons. ChatGPT Explanation: Yes, the claim is true. According to the evidence from Wikipedia, only one spacecraft, Voyager 2, has visited Neptune. It flew by the planet on August 25, 1989, and remains the only spacecraft to have visited Neptune. Additionally, Neptune has more than 13 moons. It has a total of 14 known moons, with Triton being the largest and most massive moon. Retrieved Passage #1: The spacecraft performed a near-encounter with the moon Nereid before it came within 4,400 km of Neptune’s atmosphere on 25 August, then passed close to the planet’s largest moon Triton later the same day. The spacecraft verified the existence of a magnetic field surrounding the planet and discovered that the field was offset from the centre and tilted in a manner similar to the field around Uranus... Contrastive Explanation - Supporting: The claim is true based on the evidence from Wikipedia. The evidence states that only one spacecraft, Voyager 2, has visited Neptune. It flew by the planet on August 25, 1989, and remains the only spacecraft to have visited Neptune. Additionally, the evidence mentions that Neptune has 14 known moons, including Triton, which is the largest Neptunian moon. Therefore, the claim that only one spacecraft has visited the planet and Neptune has more than 13 moons is supported by the evidence. Contrastive Explanation - Refuting: The claim is false. According to the evidence from Wikipedia, the spacecraft Voyager 2 visited Neptune on August 25, 1989. The evidence states that there have been discussions and proposals for future missions to Neptune, including an orbital mission and a flyby spacecraft. Therefore, it is clear that more than one spacecraft has visited Neptune. Furthermore, the evidence also mentions that Neptune has 14 known moons, contradicting the claim that it has more than 13 moons.

Contrastive explanations | G2: Make clear how well the system can do what it can do

Alphabetize Given a list of 18 words, sort them in alphabetical order Either a missing/extra word in the resulting list, or a pair of adjacent words in the wrong order Question: Alphabetize the following words: growing prompts determining recreation evolve payable ruled patrols estimate emergency fate shrimp urges intoxicated narrator revert players pharmaceutical Answer: determining emergency evolve estimate fate growing intoxicated narrator patrols pharmaceutical payable players prompts recreation revert ruled shrimp urges Critique: Words misordered: evolve comes alphabetically after estimate

AI Critiques | G2: Make clear how well the system can do what it can do

Hugging Chat’s interface showing model selection, suggested example prompts, an input prompt box and a disclaimer “Generated content may be inaccurate or false”.

Hugging Chat | G2: Make clear how well a system can do what it can do 

Copilot in PowerPoint | 1A: Introductory blurb