ARC
Abstraction & Reasoning Corpus
About ARC
- The Abstraction and Reasoning Corpus (ARC) is a unique benchmark designed to measure AI skill acquisition and track progress towards achieving human-level AI.
- Introduced in 2019 by François Chollet, a software engineer and AI researcher at Google.
- Chollet's influential paper, "On the Measure of Intelligence," defines intelligence as an agent's ability to adapt to a constantly changing environment and respond appropriately in novel situations.
An IQ Test for AI
- ARC stands apart from traditional AI benchmarks as it doesn't rely on specific tasks to gauge intelligence.
- Instead, it challenges an algorithm to solve a variety of previously unknown tasks based on a few demonstrations, typically three per task.
- While humans can effortlessly solve an average of 80% of all ARC tasks, current algorithms can only manage up to 31%.
Intelligence Benchmark
- ARC evaluates an AI's ability to tackle each task from scratch, using only the kind of prior knowledge about the world that humans naturally possess, known as core knowledge.
- Modern deep-learning models and large language models score near zero on ARC, highlighting the need for innovative approaches to reach human-level AI.
Intelligence Comparability
- ARC tasks can be solved using only the core knowledge that young children naturally acquire or are born with, without requiring any specialized expertise.
- Task solutions should not depend on any specific knowledge such as language or culture-specific knowledge (e.g., names of Hollywood actors).
- As a general principle, ARC is a test that can be taken by anyone, regardless of their background, including a Martian, a human, or a machine from a hypothetical planet "Metal."
- To experience it firsthand, explore the journey of Brainius: Read Short Story.
ARC Tasks Structure
- Each task is composed of grids that range in size from a minimum of 1x1 to a maximum of 30x30.
- The cells within the grid are filled with a number from 0 to 9, each represented by a distinct color, totaling ten different colors.
4x4
17x10
30x30
Test Procedure
- Test-takers are given a set of demonstration grid pairs. These serve as examples from which they must derive the output grid for the actual test.
- The task involves determining the size of the output grid for the test and correctly filling each cell of the grid with the appropriate color or number.
Criteria for Success
- The construction of the output grid is deemed successful only if both the size of the grid and the color of each cell precisely match the expected answer.
ARC Playground
- Now that you're familiar with ARC tasks, it's time to put your knowledge to the test in the ARC Playground.
- After registering, you'll gain access to the task editor where you can create your own ARC tasks. You even have the opportunity to submit your tasks for inclusion in ARC II.
Getting Started
- To begin constructing your solution algorithms, please download both the training set, which consists of 400 tasks, and the evaluation set, also comprising 400 tasks.
Download ARC
For Humankind
Villa Fontana Obere Strasse 22B 7270 Davos, Switzerland
Powered by