Abstract Reasoning Puzzles

Newsthink on MSN

What is the puzzle that 90% of people fail?

A simple card puzzle has been used for decades to test human reasoning. Known as the Wason Selection Task, it asks ...

Can you solve this Secret Genius question 'only very few people will get right'

Are you a Secret Genius? Channel 4's exciting new competition programme featuring Alan Carr and Susie Dent challenges individuals from diverse backgrounds to uncover those who possess remarkable ...

PC Magazine

Google Gemini 3.1 Pro Is Here, Beats Rivals in Key AI Benchmarks

It may feel like yesterday that Google released Gemini 3 Pro, but it's back with another update—Gemini 3.1 Pro—featuring big improvements and impressive benchmarks. "3.1 Pro is designed for tasks ...

Gizmochina

Google releases Gemini 3.1 Pro, pushing deeper into the AI reasoning race

Google has introduced Gemini 3.1 Pro, the latest version of its advanced AI model. The update delivers significant improvements in reasoning and problem-solving, making it one of the most powerful AI ...

Popular Mechanics

Scientists Found AI’s Fatal Flaw—The Most Advanced Models Are Failing Basic Logic Tests

Here’s what you’ll learn when you read this story: Large language models (LLMs) like ChatGPT show reasoning errors across many domains. Identifying vulnerabilities is good for public safety, industry, ...

The Conversation

Why comparisons between AI and human intelligence miss the point

Celeste Rodriguez Louro receives funding from Google. Jennifer Rodger does not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article ...

Hosted on MSN

Additional reasoning puzzles drawn from FBI agent exams

The third part of the FBI Special Agent exam challenge. Trump wants nations to pay $1 billion to stay on peace board Leader linked to ISIS ambush that killed 3 Americans dead after US strike: CENTCOM ...

VentureBeat

Databricks' OfficeQA uncovers disconnect: AI agents ace abstract tests but stall at 45% on enterprise docs

There is no shortage of AI benchmarks in the market today, with popular options like Humanity's Last Exam (HLE), ARC-AGI-2 and GDPval, among numerous others. AI agents excel at solving abstract math ...

NBC News

AI's capabilities may be exaggerated by flawed tests, according to new study

Researchers behind a new study say that the methods used to evaluate AI systems’ capabilities routinely oversell AI performance and lack scientific rigor. The study, led by researchers at the Oxford ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results