https://github.com/rdmpage/treebase-studies
Adding identifiers to TreeBASE studies
https://github.com/rdmpage/treebase-studies
Last synced: 11 months ago
JSON representation
Adding identifiers to TreeBASE studies
- Host: GitHub
- URL: https://github.com/rdmpage/treebase-studies
- Owner: rdmpage
- Created: 2018-05-09T16:25:41.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-05-10T05:48:43.000Z (about 8 years ago)
- Last Synced: 2025-03-10T18:48:07.446Z (about 1 year ago)
- Language: PHP
- Size: 6.48 MB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# treebase-studies
Adding identifiers to TreeBASE studies
## OAI-PMH
Can fetch complete list of TreeBASE studies via OAI-PMH: https://treebase.org/treebase-web/top/oai?verb=ListRecords&metadataPrefix=oai_dc
For more details on this service see the [TreeBASE wiki](https://github.com/TreeBASE/treebase/wiki/OAI-PMH). Unfortunately this feed doesn't include DOIs as a separate field, so we then need to harvest each record individually :(
Fetch the OAI-PMH list of studies:
```
curl 'https://treebase.org/treebase-web/top/oai?verb=ListRecords&metadataPrefix=oai_dc' > oai.xml
```
## Harvest individual study metadata
Run fetch_studies.php to parse the OAI-PMH output to extract TreeBASE study identifiers and then grab the RSS file for each study using the URL http://purl.org/phylo/treebase/phylows/study/TB2: + . Cache the RSS in the “studies” folder.
## Mendeley
I set up a public group for TreeBASE papers on Mendeley https://www.mendeley.com/community/treebase/ but apparently this now requires you to log in to access it. For many of the papers I added DOIs and/or other links. The RIS file for this the 2604 papers in this group is available in this repository as treebase.ris.
## Add DOIs
Run parse-studies.php to read the RSS files for each study, convert to RIS, and try and add missing DOIs via CrossRef.
## Generate SQL files
To explore the data I generated an SQL dump of the metadata for the studies, with a separate column for the DOIs I added to help with debugging. Dumps of the data are in the data folder.