https://github.com/genspectrum/nextclade-datasets
https://github.com/genspectrum/nextclade-datasets
Last synced: 5 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/genspectrum/nextclade-datasets
- Owner: GenSpectrum
- Created: 2024-07-14T17:19:08.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-04-03T22:32:40.000Z (about 2 months ago)
- Last Synced: 2025-04-03T22:34:37.615Z (about 2 months ago)
- Size: 4.04 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# nextclade-datasets
This directory is a Genspectrum-maintained nextclade server, created using the docs: https://github.com/nextstrain/nextclade_data/blob/master/docs/dataset-server-maintenance.md.
You can run the server locally for testing by pasting
```
https://clades.nextstrain.org/?dataset-server=https://raw.githubusercontent.com/genspectrum/nextclade-datasets/main/data
```into an incognito browser.
How to add new datasets?
1. Create a dataset following [nextclade's instructions](https://github.com/nextstrain/nextclade_data/blob/master/docs/dataset-creation-guide.md).
2. Update the `index.json`: this should include the details from each pathogen.json folder, additionally the `index.json` expects datasets to be versioned. For simplicity set version to unreleased and keep each dataset in a subdirectory called `unreleased`.
3. Zip the contents of the dataset into `dataset.zip` - this is what will be downloaded by nextclade and unzipped prior to use.```
for i in {1..8}; do
rm -rf dataset.zip
cd seg$i/unreleased
zip -r dataset.zip *
cd -
done
```Note that steps 2 and 3 are performed automatically by the CI when you create an official nextclade dataset, using the [rebuild script](https://github.com/nextstrain/nextclade_data/blob/master/scripts/rebuild/).
Download H5N1 datasets as follows:
```
for i in {1..8}; do
nextclade_dataset_name=flu/h5n1/seg$i
nextclade_dataset_server=https://raw.githubusercontent.com/genspectrum/nextclade-datasets/main/data
nextclade3 dataset get --name $nextclade_dataset_name --server $nextclade_dataset_server --output-dir output$i
done
```