Abstraction & Reasoning Corpus
The Abstraction and Reasoning Corpus (ARC) is a benchmark for measuring AI skill acquisition, allowing us to quantify progress towards human-level AI. The test was introduced in 2019 by François Chollet, a software engineer and AI researcher working at Google. He established in his seminal paper On the Measure of Intelligence, that intelligence is the ability of an agent to adapt to an ever-changing environment and produce appropriate behavior in never-seen-before situations.
An IQ Test for AI
ARC is different from traditional AI benchmarks, as it does not rely on specific tasks to measure intelligence. Instead, it requires an algorithm to find the solution to various and previously unknown tasks from only a handful of demonstrations - typically 3 per task. While humans on average solve 80% of all ARC tasks effortlessly, today's algorithms achieve no more than 31%.
ARC tests the ability of an AI to approach each task from scratch, using only the same kind of prior knowledge about the world that humans inherently possess - so called core knowledge. For this reason, modern deep-learning models as well as big-language models score close to zero on ARC, underscoring the need for novel approaches on the road to human-level AI.
ARC tasks can be solved using only the core knowledge that young children acquire naturally or are born with, and do not require any specialized expertise. Additionally, solving tasks should not rely on any specific knowledge such as language or cultural-specific knowledge (e.g. names of Hollywood actors). As a general principle, ARC is a test that can be taken by anyone, regardless of their background, including a Martian, a human, or a machine from a hypothetical planet "Metal." To see for yourself, read about the journey of Brainius...
Each task consists of grids with a minimum size of 1x1 and a maximum size of 30x30. The cells of the grid are filled with a number between 0 and 9, represented by ten different colors.
A test-taker is provided with a set of demonstration or "example" grid pairs from which the output grid of the "test" must be derived. This includes setting the size of the output grid for the "test" and filling each cell of the grid with the correct color (i.e. number).
Finally, the output grid of the test is considered successfully constructed only if the size of the grid and the color of each cell exactly match the expected answer.
Now that you know everything about ARC tasks visit the ARC Playground and solve some ARC tasks! After registration, you can go to the task editor and create ARC tasks of your own and even submit them for ARC II.
ARC I - Data
Download the training set (400 tasks) and the evaluation set (400 tasks) to start building your solution algorithms: