Community

Interview

Test-Time Augmentation to solve ARC

With a dual career spanning clinical psychology and software development, Jack Cole and his team have brought a unique perspective to tackling the ARC Challenge. Leveraging cognitive insights and advanced machine learning techniques, the team’s approach stands out for its novel use of test-time fine-tuning and synthetic data enhancement.

Credit: 3Blue1Brown

Interview with Jack Cole
ARC World Record Holder

📅 April 16, 2024

1. What is your academic and professional background? Tell us about yourself.

At 17, I founded my first computer-related company. I currently lead Mindware Consulting, Inc., a company that has created a few successful apps with over 30 million downloads. I have also extensively consulted at the intersection of computing and psychology, working on electronic medical record systems and developing software for data collection in psychology and psychophysiology research.

I hold a PhD in Clinical Psychology and maintain a part-time psychotherapy practice. I have a passion for challenging contests and complex problems. As a youth, I was awarded scholarships for winning computer science-related Olympiads. Although I chose to pursue a more human-centered career in psychology, I have continued to maintain a dual career in software development and consulting.

My background in cognitive testing and neuropsychology, gained through my psychology training, has significantly influenced my approach to conceptualizing and solving the problems presented in the ARC Challenge.

2. How did you find out about ARC and what got you hooked to invest this much time into solving the test?

Over the past six years, I have been immersed in studying AI and machine learning. My interest was piqued when I encountered François Chollet’s On the Measure of Intelligence paper. At the time, I was experimenting with GPT-3 and had evaluated some of its capabilities, finding the results intriguing. It was a presentation by Yannick Kilcher on ARC that ultimately inspired me to delve deeper into the challenge.

I began by conducting several experiments with GPT-3, including fine-tuning. Once I started fine-tuning my own models and running experiments, I became thoroughly engrossed in the project. I also collaborated closely with my teammates from the previous year (and 2022), Mohamed Osman, Phung Cheng Fei, and Matteo Batelic. For 2024, the team consists of myself and Mohamed Osman, who is leading our efforts towards publishing a paper our approach.

3. How would you summarize your ARC solution in a few sentences; what makes it stand out from other solutions?

Our ARC solution stands out due to several key elements. Firstly, we fine-tune models on synthetic and augmented data. Secondly, we employ test-time fine-tuning. Lastly, we have developed an approach called AIRV (augment, inference, reverse augmentation, and vote), which is analogous to test-time augmentation. These innovations are crucial, as transformer models perform relatively poorly on ARC without them.

In recent months, our approach has been bolstered by the outstanding work of Michael Hodel on synthetic data, further enhancing our solution’s effectiveness. Our best single solution model has achieved a maximum score of 33% on Kaggle, besting all previous approaches combined (save for our own ensemble that scored 34% with Lab42).

4. What are your plans on how to reach an even higher score? Are you thinking about developing new AI models or training techniques?

Without any additional innovations, my current approach is likely to keep advancing. It has continued at a rate of about 1 additional hidden test set item every week or two. This is due to ongoing training on TPUs that I have a grant from Google TPU Research Cloud for training. I am currently training different models of various sizes and architectures. If we are able to receive some financial support, I have a large roadmap of additional techniques to explore (largely around notions of self-improving loops or bootstrapping). 

5. What is your hope that an ARC solution reaching almost 100% will contribute to AI research? Are you imagining any existing real-life problems that can be solved with an ARC solution?

I feel certain that reaching close to 100% on ARC will greatly contribute to AI research. Chollet developed it to require intelligence to be able to solve. So far, I think it is holding up to that aim. Any system that solves ARC is likely to possess intelligence. As I have worked, I have tried to expand my view to consider how the approach could be applied in different contexts. I am considering competing in other competitions with a similar approach, and maybe even the same model. I have tried to keep the model training as broad as possible so that it may have applicability beyond ARC.

For Humankind

Villa Fontana
Obere Strasse 22B
7270 Davos, Switzerland

Lab42

Powered by 

For Humankind

Lab42
Villa Fontana
Obere Strasse 22B
7270 Davos, Switzerland

Lab42

Powered by

For Humankind

Lab42
Villa Fontana
Obere Strasse 22B
7270 Davos, Switzerland

Lab42

Powered by