https://github.com/kentaroy47/arxiv-scraping

let's scrape arxiv papers by python
https://github.com/kentaroy47/arxiv-scraping

Last synced: 7 months ago
JSON representation

let's scrape arxiv papers by python

Host: GitHub
URL: https://github.com/kentaroy47/arxiv-scraping
Owner: kentaroy47
License: mit
Created: 2019-09-03T00:15:25.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2019-09-03T02:00:56.000Z (about 6 years ago)
Last Synced: 2025-02-01T03:45:26.334Z (8 months ago)
Language: Python
Size: 256 KB
Stars: 3
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# arxiv-scraping
let's scrape deep-learning arxiv papers by python!

Number of citations are acquired from semantic scholar.

## usage

python arxiv_scrape.py

## plot and analyze.

most arxiv papers have citations == 0.

This is kind of sad.

![hist](./histgram_of_citations.png)

Out of 6000 acquired papers, almost half of them have 0 citations.

80% of the papers have less than 10 citations (so don't be depressed if you don't get any citations! I guess it's pretty normal.)

This is the data I used for the histogram.

```python
matplotlib.pyplt.hist(cite, range=(0, 100))

cite=[2.904e+03, 5.870e+02, 3.660e+02, 2.100e+02, 2.090e+02, 1.450e+02,
1.360e+02, 9.200e+01, 9.200e+01, 7.600e+01, 4.400e+01, 4.300e+01,
4.100e+01, 4.300e+01, 3.700e+01, 3.500e+01, 3.600e+01, 3.200e+01,
2.200e+01, 3.000e+01, 3.200e+01, 2.500e+01, 2.500e+01, 2.300e+01,
2.100e+01, 2.200e+01, 1.500e+01, 2.100e+01, 1.200e+01, 1.800e+01,
1.700e+01, 1.300e+01, 1.100e+01, 1.500e+01, 1.000e+01, 1.700e+01,
1.500e+01, 1.000e+01, 9.000e+00, 1.600e+01, 8.000e+00, 5.000e+00,
8.000e+00, 7.000e+00, 8.000e+00, 5.000e+00, 6.000e+00, 7.000e+00,
9.000e+00, 6.000e+00, 9.000e+00, 5.000e+00, 8.000e+00, 2.000e+00,
3.000e+00, 3.000e+00, 3.000e+00, 6.000e+00, 6.000e+00, 5.000e+00,
7.000e+00, 9.000e+00, 9.000e+00, 7.000e+00, 5.000e+00, 1.000e+01,
5.000e+00, 3.000e+00, 3.000e+00, 5.000e+00, 3.000e+00, 5.000e+00,
2.000e+00, 4.000e+00, 1.000e+00, 2.000e+00, 2.000e+00, 4.000e+00,
6.000e+00, 2.000e+00, 5.000e+00, 0.000e+00, 1.000e+00, 4.000e+00,
4.000e+00, 3.000e+00, 3.000e+00, 1.000e+00, 2.000e+00, 3.000e+00,
1.000e+00, 1.000e+01, 1.000e+00, 3.000e+00, 2.000e+00, 5.000e+00,
2.000e+00, 3.000e+00, 1.000e+00]
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kentaroy47/arxiv-scraping

Awesome Lists containing this project

README