Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/paperswithcode/axcell
Tools for extracting tables and results from Machine Learning papers
https://github.com/paperswithcode/axcell
Last synced: 3 days ago
JSON representation
Tools for extracting tables and results from Machine Learning papers
- Host: GitHub
- URL: https://github.com/paperswithcode/axcell
- Owner: paperswithcode
- License: apache-2.0
- Created: 2019-06-27T17:44:50.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-11-28T06:53:08.000Z (almost 2 years ago)
- Last Synced: 2024-11-04T17:47:22.520Z (5 days ago)
- Language: Python
- Homepage:
- Size: 646 KB
- Stars: 396
- Watchers: 13
- Forks: 55
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# AxCell: Automatic Extraction of Results from Machine Learning Papers
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/axcell-automatic-extraction-of-results-from/scientific-results-extraction-on-pwc)](https://paperswithcode.com/sota/scientific-results-extraction-on-pwc?p=axcell-automatic-extraction-of-results-from)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/axcell-automatic-extraction-of-results-from/scientific-results-extraction-on-nlp-tdms-exp)](https://paperswithcode.com/sota/scientific-results-extraction-on-nlp-tdms-exp?p=axcell-automatic-extraction-of-results-from)This repository is the official implementation of [AxCell: Automatic Extraction of Results from Machine Learning Papers](https://arxiv.org/abs/2004.14356).
![pipeline](https://user-images.githubusercontent.com/13535078/81287158-33e01000-905a-11ea-8573-d716373efbdd.png)
## Requirements
To create a [conda](https://www.anaconda.com/distribution/) environment named `axcell` and install requirements run:
```setup
conda env create -f environment.yml
```Additionally, `axcell` requires `docker` (that can be run without `sudo`). Run `scripts/pull_docker_images.sh` to download necessary images.
## Datasets
We publish the following datasets:
* [ArxivPapers](https://github.com/paperswithcode/axcell/releases/download/v1.0/arxiv-papers.csv.xz)
* [SegmentedTables & LinkedResults](https://github.com/paperswithcode/axcell/releases/download/v1.0/segmented-tables.json.xz)
* [PWCLeaderboards](https://github.com/paperswithcode/axcell/releases/download/v1.0/pwc-leaderboards.json.xz)See [datasets](notebooks/datasets.ipynb) notebook for an example of how to load the datasets provided below. The [extraction](notebooks/extraction.ipynb) notebook shows how to use `axcell` to extract text and tables from papers.
## Evaluation
See the [evaluation](notebooks/evaluation.ipynb) notebook for the full example on how to evaluate AxCell on the PWCLeaderboards dataset.
## Training
* [pre-training language model](notebooks/training/lm.ipynb) on the ArxivPapers dataset
* [table type classifier](notebooks/training/table-type-classifier.ipynb) and [table segmentation](notebooks/training/table-segmentation.ipynb) on the SegmentedResults dataset## Pre-trained Models
You can download pretrained models here:
- [axcell](https://github.com/paperswithcode/axcell/releases/download/v1.0/models.tar.xz) — an archive containing the taxonomy, abbreviations, table type classifier and table segmentation model. See the [results-extraction](notebooks/results-extraction.ipynb) notebook for an example of how to load and run the models
- [language model](https://github.com/paperswithcode/axcell/releases/download/v1.0/lm.pth.xz) — [ULMFiT](https://arxiv.org/abs/1801.06146) language model pretrained on the ArxivPapers dataset## Results
AxCell achieves the following performance:
###
| Dataset | Macro F1 | Micro F1 |
| ---------- |---------------- | -------------- |
| [PWC Leaderboards](https://paperswithcode.com/sota/scientific-results-extraction-on-pwc) | 21.1 | 28.7 |
| [NLP-TDMS](https://paperswithcode.com/sota/scientific-results-extraction-on-nlp-tdms-exp) | 19.7 | 25.8 |## License
AxCell is released under the [Apache 2.0 license](LICENSE).
## Citation
The pipeline is described in the following paper:
```bibtex
@inproceedings{axcell,
title={AxCell: Automatic Extraction of Results from Machine Learning Papers},
author={Marcin Kardas and Piotr Czapla and Pontus Stenetorp and Sebastian Ruder and Sebastian Riedel and Ross Taylor and Robert Stojnic},
year={2020},
booktitle={2004.14356}
}
```