Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/karpathy/arxiv-sanity-lite
arxiv-sanity lite: tag arxiv papers of interest get recommendations of similar papers in a nice UI using SVMs over tfidf feature vectors based on paper abstracts.
https://github.com/karpathy/arxiv-sanity-lite
arxiv deep-learning flask machine-learning
Last synced: 10 days ago
JSON representation
arxiv-sanity lite: tag arxiv papers of interest get recommendations of similar papers in a nice UI using SVMs over tfidf feature vectors based on paper abstracts.
- Host: GitHub
- URL: https://github.com/karpathy/arxiv-sanity-lite
- Owner: karpathy
- License: mit
- Created: 2021-11-13T04:34:22.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2023-06-19T16:23:02.000Z (over 1 year ago)
- Last Synced: 2024-10-15T08:41:06.322Z (24 days ago)
- Topics: arxiv, deep-learning, flask, machine-learning
- Language: Python
- Homepage: https://arxiv-sanity-lite.com
- Size: 991 KB
- Stars: 1,163
- Watchers: 22
- Forks: 131
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-list - arxiv-sanity lite - Tag arxiv papers of interest get recommendations of similar papers in a nice UI using SVMs over tfidf feature vectors based on paper abstracts. (Programming Language Tutorials / JavaScript)
README
# arxiv-sanity-lite
A much lighter-weight arxiv-sanity from-scratch re-write. Periodically polls arxiv API for new papers. Then allows users to tag papers of interest, and recommends new papers for each tag based on SVMs over tfidf features of paper abstracts. Allows one to search, rank, sort, slice and dice these results in a pretty web UI. Lastly, arxiv-sanity-lite can send you daily emails with recommendations of new papers based on your tags. Curate your tags, track recent papers in your area, and don't miss out!
I am running a live version of this code on [arxiv-sanity-lite.com](https://arxiv-sanity-lite.com).
![Screenshot](screenshot.jpg)
#### To run
To run this locally I usually run the following script to update the database with any new papers. I typically schedule this via a periodic cron job:
```bash
#!/bin/bashpython3 arxiv_daemon.py --num 2000
if [ $? -eq 0 ]; then
echo "New papers detected! Running compute.py"
python3 compute.py
else
echo "No new papers were added, skipping feature computation"
fi
```You can see that updating the database is a matter of first downloading the new papers via the arxiv api using `arxiv_daemon.py`, and then running `compute.py` to compute the tfidf features of the papers. Finally to serve the flask server locally we'd run something like:
```bash
export FLASK_APP=serve.py; flask run
```All of the database will be stored inside the `data` directory. Finally, if you'd like to run your own instance on the interwebs I recommend simply running the above on a [Linode](https://www.linode.com), e.g. I am running this code currently on the smallest "Nanode 1 GB" instance indexing about 30K papers, which costs $5/month.
(Optional) Finally, if you'd like to send periodic emails to users about new papers, see the `send_emails.py` script. You'll also have to `pip install sendgrid`. I run this script in a daily cron job.
#### Requirements
Install via requirements:
```bash
pip install -r requirements.txt
```#### Todos
- Make website mobile friendly with media queries in css etc
- The metas table should not be a sqlitedict but a proper sqlite table, for efficiency
- Build a reverse index to support faster search, right now we iterate through the entire database#### License
MIT