https://github.com/dfm/data.arxiv.io

Code and website for my arxiv abstract dataset
https://github.com/dfm/data.arxiv.io

Last synced: 2 months ago
JSON representation

Code and website for my arxiv abstract dataset

Host: GitHub
URL: https://github.com/dfm/data.arxiv.io
Owner: dfm
License: mit
Created: 2013-11-27T17:43:41.000Z (over 12 years ago)
Default Branch: main
Last Pushed: 2020-06-12T18:17:21.000Z (almost 6 years ago)
Last Synced: 2025-04-13T19:54:08.634Z (about 1 year ago)
Language: Python
Size: 7.81 KB
Stars: 9
Watchers: 2
Forks: 3
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

A little script that scrapes the metadata from [the arXiv](http://arxiv.org)
and saves it in a form that is useful for statistical analysis.

Usage
-----

You'll need to install [NLTK](http://nltk.org) first and then run

```
python scrape.py
```

This will take many hours to run and it will save files of the form
`data/astro-ph/2007-05-10.txt.gz` with one abstract per line. Each row has the
tab-separated columns: arxiv id, space-separated categories, tokenized title,
and tokenized abstract.

Credits
-------

Licensed under the terms of the MIT License (see `LICENSE`).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dfm/data.arxiv.io

Awesome Lists containing this project

README