https://github.com/zietzm/biorxiv-co-authorship
BioRxiv co-authorship network
https://github.com/zietzm/biorxiv-co-authorship
Last synced: 3 months ago
JSON representation
BioRxiv co-authorship network
- Host: GitHub
- URL: https://github.com/zietzm/biorxiv-co-authorship
- Owner: zietzm
- License: bsd-2-clause
- Created: 2019-03-23T14:38:51.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2019-03-23T21:54:03.000Z (about 6 years ago)
- Last Synced: 2025-01-13T01:34:41.015Z (4 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 12.8 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# BioRxiv co-authorship network
This repository houses a network of all co-authorship relations from papers on [bioRxiv](http://biorxiv.org) up to March 23, 2019, including date of publication and corresponding DOI and URL. All papers have basic metadata stored in `articles.json`, in the following format:
```json
[
{
"date": "YYYY/MM/DD",
"authors": [
[
"LAST",
"FIRST"
], ...
],
"url": "https://doi.org/10.1101/BIORXIV_ID"
}, ...
]```
Paper authors are then given unique numerical IDs, and all co-author relationships (the network itself) are stored in `biorxiv_coauthor.tsv.xz`, in the following format: [`id_a`, `id_b`, `year`]. This format means that author pairs with multiple co-authorships will have all existing co-authorships in the network. The rationale for this decision is so that the network can be time-resolved later. Note, "authors" in the context of this repository includes only individuals, meaning consortia and named groups are excluded from the network.
If you require further metadata for each paper, such as title or abstracts, I recommend using the DOIs and the [CrossRef API](https://github.com/CrossRef/rest-api-doc). DOI extraction from the URLs is simple, and is done in `2.construct_edges.ipynb` using the following regex: `'10.1101/[0-9]+'`.