https://github.com/caltechlibrary/dataset-instruction
Instructional content for the dataset package
https://github.com/caltechlibrary/dataset-instruction
Last synced: 5 months ago
JSON representation
Instructional content for the dataset package
- Host: GitHub
- URL: https://github.com/caltechlibrary/dataset-instruction
- Owner: caltechlibrary
- License: other
- Created: 2018-02-23T16:45:56.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-03-02T15:22:25.000Z (over 8 years ago)
- Last Synced: 2025-09-09T21:41:10.923Z (10 months ago)
- Language: HTML
- Homepage: https://caltechlibrary.github.io/dataset-instruction
- Size: 127 KB
- Stars: 1
- Watchers: 6
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.html
- License: LICENSE.html
Awesome Lists containing this project
README
Training on Dataset tools
=======
*Content Contributors: Robert Doiel, Tom Morrell*
*Lesson Maintainers: Robert Doiel, Tom Morrell*
**Lesson status: In Development**
## What you will learn:
* Identify the structure of a JSON file
* Gather data from an API
* Use the basic functions of dataset
* Combine data using dataset to collect citations for a publications list
* Export and Import from a Google sheet
* Index and search over a large collection of data
## Topics:
1. [Intro](00-intro-json-apis.html)
2. [Basic Dataset](01-basic-dataset.html)
3. [Working with Larger Amounts of Data](02-large-data.html)
## Requirements
This lesson requires basic familiarity with the bash shell, similar to the
experience gained through the
[Software Carpentry shell lesson](http://swcarpentry.github.io/shell-novice/).
You'll need to have a bash shell installed, you can follow
[these instructions](https://swcarpentry.github.io/workshop-template/#setup).
Two tool collections developed at Caltech Library will also be used, [datatools](https://caltechlibrary.github.io/datatools/)
and [dataset](https://caltechlibrary.github.io/dataset/). From _datatools_ we will be using
a program called _jsonmunge_ for extracting and re-formatting JSON content. _datatools_, a collection
of tools for working with CSV, XLSX and JSON content, is available [here](https://github.com/caltechlibrary/datatools/latest/releases).
_dataset_, a data management tool, is available
[here](https://github.com/caltechlibrary/dataset/latest/releases).
## References
+ data formats
+ json documentation, https://www.json.org
+ simple index maps, https://caltechlibrary.github.io/dataset/docs/dsindexer/defining-index.html
+ data sources
+ Dimension API presentation (see slide 3), https://figshare.com/s/3c8f0284e8e51718c1b2
+ CrossRef REST API, https://github.com/CrossRef/rest-api-doc
+ CrossRef Content Negotiation, https://citation.crosscite.org/docs.html
+ programs
+ curl documentation, https://curl.haxx.se/
+ dataset, https://caltechlibrary.github.io/dataset/
+ datatools, https://caltechlibrary.github.io/datatools/