{"id":23848885,"url":"https://github.com/qsbase/qs2_analysis","last_synced_at":"2026-01-31T05:34:00.385Z","repository":{"id":262330969,"uuid":"886903603","full_name":"qsbase/qs2_analysis","owner":"qsbase","description":"Benchmarks and analyses for the qs2 package","archived":false,"fork":false,"pushed_at":"2024-11-11T20:56:46.000Z","size":6664,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-12T17:05:47.779Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/qsbase.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-11T20:18:15.000Z","updated_at":"2024-11-12T22:26:40.000Z","dependencies_parsed_at":"2024-11-11T21:35:34.697Z","dependency_job_id":"b28d0b85-6a9d-462f-a0ed-3018ed4ef612","html_url":"https://github.com/qsbase/qs2_analysis","commit_stats":null,"previous_names":["qsbase/qs2_analysis"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/qsbase/qs2_analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qsbase%2Fqs2_analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qsbase%2Fqs2_analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qsbase%2Fqs2_analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qsbase%2Fqs2_analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/qsbase","download_url":"https://codeload.github.com/qsbase/qs2_analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qsbase%2Fqs2_analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28930405,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-31T04:05:25.756Z","status":"ssl_error","status_checked_at":"2026-01-31T04:02:35.005Z","response_time":128,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-02T21:38:36.199Z","updated_at":"2026-01-31T05:34:00.363Z","avatar_url":"https://github.com/qsbase.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Analyses and Benchmarks for the qs2 R package\n\nThis package provides a collection of analyses and benchmarks for the [qs2](https://github.com/qsbase/qs2) R package. \n\n20 datasets were collected in total. 16 were used for training in order to optimize performance with respect to hyper-parameters and design choices. The remaining 4 datasets were used for benchmarks and evaluation of the performance of the qs2 package. \n\nAll datasets are openly licensed. Please feel free to use this collection of datasets and cite the qs2 package!\n\n# Benchmark plots\n\n## Save time vs file size (Ubuntu)\n![save_time_vs_file_size_ubuntu](benchmarks/plots/ubuntu_write_benchmarks.png)\n\n## Read time vs save time (Ubuntu)\n![read_time_vs_save_time_ubuntu](benchmarks/plots/ubuntu_read_benchmarks.png)\n\n## Save time vs file size (Windows)\n![save_time_vs_file_size_windows](benchmarks/plots/windows_write_benchmarks.png)\n\n## Read time vs save time (Windows)\n![read_time_vs_save_time_windows](benchmarks/plots/windows_read_benchmarks.png)\n\n# Datasets\n\n## Training\n\n### 1. GAIA celestial pseudocolor dataset\n- **License**: Other ([link](https://www.cosmos.esa.int/web/gaia-users/license))\n- **Description**: Data used for pseudocolor plot of galaxy stars.\n\n### 2. enwik8 (Wikipedia data)\n- **License**: Unspecified\n- **Description**: A single column dataset with the first 100 million lines from Wikipedia.\n\n### 3. T-cell data\n- **License**: CC BY 4.0\n- **Reference**: [Adaptive Biotech COVID-2020](https://clients.adaptivebiotech.com/pub/covid-2020)\n- **Description**: A large-scale database of T-cell receptor beta (TCRβ) sequences for SARS-CoV-2 studies.\n\n### 4. MNIST\n- **License**: Artistic-2.0\n- **Reference**: [dslabs package](https://cran.r-project.org/web/packages/dslabs/index.html)\n- **Description**: Handwritten digits data for digit recognition.\n\n### 5. RNA-Seq GTEx heart data\n- **License**: Artistic-2.0\n- **Reference**: [recount3 package](https://bioconductor.org/packages/release/bioc/html/recount3.html)\n- **Description**: Gene expression counts for human heart samples.\n\n### 6. EU Copernicus ERA5 wind data\n- **License**: Other ([link](https://www.copernicus.eu/en/access-data/copyright-and-licences))\n- **Reference**: [Copernicus Climate Data Store](https://cds.climate.copernicus.eu/)\n- **Description**: Monthly means of wind data at 10 meters height for 2023.\n\n### 7. Berkeley 2010-2019 global temperature\n- **License**: CC BY-NC 4.0\n- **Reference**: [Berkeley Earth](https://berkeleyearth.org/data/)\n- **Description**: Global temperature data from 2010 to 2019.\n\n### 8. OpenStreetMap (OSM) data for Oahu\n- **License**: Open Data Commons ODbL\n- **Reference**: [OSM Downloading Data](https://wiki.openstreetmap.org/wiki/Downloading_data)\n- **Description**: Map data of the Oahu region from OpenStreetMap.\n\n### 9. NYC vehicle collisions\n- **License**: Public access\n- **Description**: Dataset on motor vehicle collisions and crashes in NYC.\n\n### 10. Methylation volcano data\n- **License**: Artistic-2.0\n- **Reference**: [methylationArrayAnalysis](https://bioconductor.org/packages/release/workflows/html/methylationArrayAnalysis.html)\n- **Description**: DNA methylation data for epigenetic studies.\n\n### 11. Clifford attractor\n- **License**: N/A\n- **Reference**: [Clifford attractor](https://paulbourke.net/fractals/clifford/)\n- **Description**: Fractal data generated using the Clifford attractor.\n\n### 12. Twitter Sentiment140\n- **License**: N/A\n- **Reference**: Go, A., Bhayani, R., and Huang, L., 2009\n- **Description**: Sentiment analysis data from Twitter.\n\n### 13. Steam game database 2024\n- **License**: MIT\n- **Reference**: [Steam Games Dataset](https://www.kaggle.com/datasets/artermiloff/steam-games-dataset)\n- **Description**: A dataset of games on the Steam platform.\n\n### 14. Protein structure PISCES\n- **License**: DbCL v1.0\n- **Reference**: Wang, Guoli, and Roland L. Dunbrack Jr. \"PISCES\"\n- **Description**: Data on protein secondary structure.\n\n### 15. DC real estate listings 2024\n- **License**: CC BY-NC-SA 4.0\n- **Reference**: [Washington D.C. housing market dataset](https://www.kaggle.com/datasets/datadetective08/washington-d-c-housing-market-2024)\n- **Description**: Real estate listings in Washington, D.C.\n\n### 16. NYSE stock prices (1962-2024)\n- **License**: Apache 2.0\n- **Reference**: [Stock prices dataset](https://www.kaggle.com/datasets/eren2222/nasdaq-nyse-nyse-a-otc-daily-stock-1962-2024)\n- **Description**: Daily stock prices for NYSE stocks.\n\n## Testing\n\n### 17. 1000 Genomes non-coding VCF\n- **License**: CC BY-NC-SA 3.0\n- **Reference**: 1000 Genomes Project Consortium, Nature 526\n- **Description**: Annotated VCF files of non-coding regions in human genomes.\n\n### 18. B-cell AIRR data (\"Pet shop mouse 3\")\n- **License**: N/A\n- **Reference**: [Project page](https://www.ebi.ac.uk/ena/browser/view/PRJEB18631)\n- **Description**: Data on antibody/B-cell and T-cell receptor repertoires.\n\n### 19. IP location dataset (2023)\n- **License**: CC BY 4.0\n- **Reference**: [Global IP dataset](https://www.kaggle.com/datasets/joebeachcapital/global-ip-dataset-by-location-2023)\n- **Description**: Geolocation data for IP addresses globally.\n\n### 20. Netflix movie ratings\n- **License**: CC0 Public Domain\n- **Reference**: [Netflix movie rating dataset](https://www.kaggle.com/datasets/rishitjavia/netflix-movie-rating-dataset)\n- **Description**: Movie ratings data from Netflix.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqsbase%2Fqs2_analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqsbase%2Fqs2_analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqsbase%2Fqs2_analysis/lists"}