https://github.com/samacqua/LARC
Language-annotated Abstraction and Reasoning Corpus
https://github.com/samacqua/LARC
arc
Last synced: about 1 year ago
JSON representation
Language-annotated Abstraction and Reasoning Corpus
- Host: GitHub
- URL: https://github.com/samacqua/LARC
- Owner: samacqua
- License: other
- Created: 2021-01-31T20:19:30.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2023-05-20T01:12:42.000Z (about 3 years ago)
- Last Synced: 2024-11-12T15:43:18.357Z (over 1 year ago)
- Topics: arc
- Language: JavaScript
- Homepage: https://samacquaviva.com/LARC/explore/
- Size: 9.28 MB
- Stars: 78
- Watchers: 4
- Forks: 10
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Language-complete Abstraction and Reasoning Corpus (LARC)
This repository contains the LARC dataset and supporting assets
The entire dataset can be browsed at [the explorer interface](https://samacqua.github.io/LARC/explore) take a look!
Here's a [quick 5 minutes slideslive video explaining this work](https://recorder-v3.slideslive.com/#/share?share=75868&s=71a5a42e-dceb-4055-9ae7-9bfccd4ffc1f)
*"How can we build intelligent systems that achieve human-level performance on challenging and structured domains (like ARC), with or without additional human guidance? We posit the answer may be found in studying natural programs - instructions humans give to each other to communicate how to solve a task. Like a computer program, these instructions can be reliably "executed" by others to produce intended outputs."*
A comprehensive view of this dataset and its goals can be found in [Communicating Natural Programs to Humans and Machines (Neurips Dataset and Benchmark, 2022)](https://arxiv.org/abs/2106.07824)
LARC is curated from a communication game, where
one participant, the *describer* solves an [ARC task](https://github.com/fchollet/ARC) and describes the solution to a different participant,
the *builder*, who must solve the task on the new input using the description alone.
The successful descriptions are "language-complete" in a sense that it fully captures the underlying ARC task in the absence of the original input-output examples.
Citation
```
@article{acquaviva2021communicating,
title={Communicating Natural Programs to Humans and Machines},
author={Acquaviva, Samuel and Pu, Yewen and Kryven, Marta and Wong, Catherine and Ecanow, Gabrielle E and Nye, Maxwell and Sechopoulos, Theodoros and Tessler, Michael Henry and Tenenbaum, Joshua B},
journal={arXiv preprint arXiv:2106.07824},
year={2021}
}
```
The original ARC data can be found here [The Abstraction and Reasoning Corpus](https://github.com/fchollet/ARC)
## Contents
- `dataset` contains the language-complete ARC tasks and successful natural program phrase annotations
- `explorer` contains the explorer code that allows for easy browsing of the annotated tasks
- `collection` contains the source code used to curate the dataset
- `bandit` contains the formulation and environment for bandit algorithm used for collection
language-guided program synthesis code can be found [here](https://github.com/theosech/ec/tree/language-guided_program_synthesis_for_larc)
GPT4 (vision only) program induction results can be found [here](https://github.com/evanthebouncy/larc_gpt4)
## License
The [dataset](https://github.com/samacqua/LARC/tree/main/dataset) is licensed under the [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/)
All supporting code follows the [MIT License](https://opensource.org/licenses/MIT)