{"id":20074744,"url":"https://github.com/greenelab/opencitations","last_synced_at":"2025-05-05T21:32:15.051Z","repository":{"id":79359716,"uuid":"100521773","full_name":"greenelab/opencitations","owner":"greenelab","description":"Processing OpenCitations Data","archived":false,"fork":false,"pushed_at":"2017-08-17T20:22:44.000Z","size":21,"stargazers_count":20,"open_issues_count":0,"forks_count":0,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-09T04:41:40.967Z","etag":null,"topics":["citations","dataset","digital-object-identifier","doi","notebook","opencitations"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc0-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/greenelab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-08-16T18:46:57.000Z","updated_at":"2025-02-23T11:18:53.000Z","dependencies_parsed_at":"2023-03-12T07:49:47.891Z","dependency_job_id":null,"html_url":"https://github.com/greenelab/opencitations","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Fopencitations","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Fopencitations/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Fopencitations/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Fopencitations/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/greenelab","download_url":"https://codeload.github.com/greenelab/opencitations/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252580076,"owners_count":21771257,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["citations","dataset","digital-object-identifier","doi","notebook","opencitations"],"created_at":"2024-11-13T14:54:09.495Z","updated_at":"2025-05-05T21:32:15.038Z","avatar_url":"https://github.com/greenelab.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Processing OpenCitations Data\n\nThis repository processes the [OpenCitations data](http://opencitations.net/download) to make it more user-friendly and concise.\nThe primary output is [`data/citations-doi.tsv.xz`](data/citations-doi.tsv.xz), which is a catalog of DOI-to-DOI citations.\nThe file is formated like:\n\n| source_doi | target_doi |\n|------------|------------|\n| 10.1002/14651858.cd002244.pub4 | 10.1001/archneur.1990.00530120057010 |\n| 10.1002/14651858.cd002244.pub4 | 10.1002/14651858.cd002244 |\n| 10.1002/14651858.cd002244.pub4 | 10.1002/14651858.cd002244.pub2 |\n| 10.1002/14651858.cd012199 | 10.1001/jama.295.6.676 |\n| 10.1002/14651858.cd012199 | 10.1001/jama.299.16.1937 |\n| 10.1002/14651858.cd012199 | 10.1002/14651858.cd000371.pub6 |\n| 10.1002/14651858.cd012199 | 10.1002/14651858.cd009382.pub2 |\n\nAll DOIs are lowercase.\nQuality control steps were performed on the DOIs to remove clearly incorrect DOIs.\nHowever, for best results, we recommend users intersect these DOIs with a catalog of valid DOIs to remove any remaining errant DOIs.\n\n## Execution\n\nThe downloading and processing of the OpenCitations data is accomplished by sequentially running the notebooks in this repository.\nTo update the pipeline to use newer versions of OpenCitations data, one should update the figshare article IDs in [`01.download.ipynb`](01.download.ipynb).\n\n## Environment\n\nThis repository uses [conda](http://conda.pydata.org/docs/) to manage its environment as specified in [`environment.yml`](environment.yml).\nInstall the environment with:\n\n```sh\nconda env create --file=environment.yml\n```\n\nThen use `source activate opencitations` and `source deactivate` to activate or deactivate the environment.\nOn windows, use `activate opencitations` and `deactivate` instead.\n\nIn addition, to the conda environment, users will need to install the [Disk ARchive](http://dar.linux.free.fr/) (`dar`) utility to their system.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreenelab%2Fopencitations","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgreenelab%2Fopencitations","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreenelab%2Fopencitations/lists"}