Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/evamaxfield/papers-without-code
A package (and website) to automatically attempt to find the code associated with a paper.
https://github.com/evamaxfield/papers-without-code
Last synced: 3 months ago
JSON representation
A package (and website) to automatically attempt to find the code associated with a paper.
- Host: GitHub
- URL: https://github.com/evamaxfield/papers-without-code
- Owner: evamaxfield
- License: mit
- Created: 2022-11-23T19:06:53.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-03-01T17:51:07.000Z (11 months ago)
- Last Synced: 2024-10-07T19:41:54.981Z (3 months ago)
- Language: Jupyter Notebook
- Homepage: https://paperswithoutcode.org
- Size: 12.6 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# papers-without-code
[![Build Status](https://github.com/evamaxfield/papers-without-code/workflows/CI/badge.svg)](https://github.com/evamaxfield/papers-without-code/actions)
[![Python Package Documentation](https://github.com/evamaxfield/papers-without-code/workflows/Documentation/badge.svg)](https://evamaxfield.github.io/papers-without-code)A Python package ([and website](https://paperswithoutcode.org)) to automatically attempt to find GitHub
repositories that are similar to academic papers.[![Image of the Papers without Code web application homepage](https://raw.githubusercontent.com/evamaxfield/papers-without-code/main/docs/_static/web-landing.png)](https://paperswithoutcode.org)
---
## Installation
**Stable Release:** `pip install papers-without-code`
**Development Head:** `pip install git+https://github.com/evamaxfield/papers-without-code.git`## Usage
Provide a DOI, SemanticScholarID, CorpusID, ArXivID, ACL,
or URL from semanticscholar.org, arxiv.org, aclweb.org,
acm.org, or biorxiv.org. DOIs can be provided as is.
All other IDs should be given with their type, for example:
`doi:10.18653/v1/2020.acl-main.447`
or `CorpusID:202558505` or `url:https://arxiv.org/abs/2004.07180`.### CLI
```bash
pip install papers-without-codepwoc query
# or pwoc path/to/file.pdf
```### Python
```python
from papers_without_code import search_for_repossearch_for_repos("query")
# search_for_repos("path/to/file.pdf")
```⚠️ Prior to using PWOC with a PDF you must be logged in to Docker CLI via `docker login`
because we automatically fetch, spin up, and tear down containers for processing. ⚠️## How it Works
In short, we pass the query on to the Semantic Scholar search API
which provides us basic details about the paper. We use
a prompted gpt-3.5-turbo with langchain to extract keywords from the
title and abstract. We then make multiple threaded requests to GitHub's API
for repositories which match the keywords. Once we have all the possible repositories
back, we rank them by similarity between the repository's README and the paper's
abstract (or if not available, it's title).When using Papers without Code locally and providing a filepath, the only change to
this workflow, is paper details gathering. When local and providing a filepath,
we use [GROBID](https://github.com/kermitt2/grobid) to extract the
title, abstract, and author list.## Documentation
For full package documentation please visit [evamaxfield.github.io/papers-without-code](https://evamaxfield.github.io/papers-without-code).
[Exploratory data analysis of the dataset used for testing](https://evamaxfield.github.io/papers-without-code/eda.html)
## Development
See [CONTRIBUTING.md](CONTRIBUTING.md) for information related to developing the code.
**MIT License**