ARC

Abstraction & Reasoning Corpus

About ARC

The Abstraction and Reasoning Corpus (ARC) is a benchmark for measuring AI skill acquisition, allowing us to quantify progress towards human-level AI. The test was introduced in 2019 by François Chollet, a software engineer and AI researcher working at Google. He established in his seminal paper On the Measure of Intelligence, that intelligence is the ability of an agent to adapt to an ever-changing environment and produce appropriate behavior in never-seen-before situations.

An IQ Test for AI

ARC is different from traditional AI benchmarks, as it does not rely on specific tasks to measure intelligence. Instead, it requires an algorithm to find the solution to various and previously unknown tasks from only a handful of demonstrations - typically 3 per task. While humans on average solve 80% of all ARC tasks effortlessly, today's algorithms achieve no more than 31%.

Intelligence Benchmark

ARC tests the ability of an AI to approach each task from scratch, using only the same kind of prior knowledge about the world that humans inherently possess - so called core knowledge. For this reason, modern deep-learning models as well as big-language models score close to zero on ARC, underscoring the need for novel approaches on the road to human-level AI.

Comparability

ARC tasks can be solved using only the core knowledge that young children acquire naturally or are born with, and do not require any specialized expertise. Additionally, solving tasks should not rely on any specific knowledge such as language or cultural-specific knowledge (e.g. names of Hollywood actors). As a general principle, ARC is a test that can be taken by anyone, regardless of their background, including a Martian, a human, or a machine from a hypothetical planet "Metal." To see for yourself, read about the journey of Brainius...

Core Knowledge

ARC Tasks

Each task consists of grids with a minimum size of 1x1 and a maximum size of 30x30. The cells of the grid are filled with a number between 0 and 9, represented by ten different colors.

4x4

17x10

30x30

A test-taker is provided with a set of demonstration or "example" grid pairs from which the output grid of the "test" must be derived. This includes setting the size of the output grid for the "test" and filling each cell of the grid with the correct color (i.e. number).

Finally, the output grid of the test is considered successfully constructed only if the size of the grid and the color of each cell exactly match the expected answer.

ARC Playground

Now that you know everything about ARC tasks visit the ARC Playground and solve some ARC tasks! After registration, you can go to the task editor and create ARC tasks of your own and even submit them for ARC II.

ARC I - Data

Download the training set (400 tasks) and the evaluation set (400 tasks) to start building your solution algorithms:

Download ARC

For Humankind

Lab42
Villa Fontana
Obere Strasse 22B
7270 Davos, Switzerland

Lab42

Powered by

For Humankind

Lab42
Villa Fontana
Obere Strasse 22B
7270 Davos, Switzerland

Lab42

Powered by

For Humankind

Villa Fontana
Obere Strasse 22B
7270 Davos, Switzerland

Lab42

Powered by