{"id":22656930,"url":"https://github.com/icelk/data-samples-heavy","last_synced_at":"2025-06-29T06:32:35.381Z","repository":{"id":104833285,"uuid":"484112522","full_name":"Icelk/data-samples-heavy","owner":"Icelk","description":"Some data samples used by my statistics and regression calculator.","archived":false,"fork":false,"pushed_at":"2022-04-21T16:06:18.000Z","size":13362,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-29T08:11:23.605Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Icelk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-21T15:47:17.000Z","updated_at":"2022-04-21T16:06:22.000Z","dependencies_parsed_at":null,"dependency_job_id":"e28ec25f-5305-40dd-8067-3b6f46f6436d","html_url":"https://github.com/Icelk/data-samples-heavy","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Icelk/data-samples-heavy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Icelk%2Fdata-samples-heavy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Icelk%2Fdata-samples-heavy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Icelk%2Fdata-samples-heavy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Icelk%2Fdata-samples-heavy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Icelk","download_url":"https://codeload.github.com/Icelk/data-samples-heavy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Icelk%2Fdata-samples-heavy/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262547079,"owners_count":23327096,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-09T10:16:54.549Z","updated_at":"2025-06-29T06:32:35.364Z","avatar_url":"https://github.com/Icelk.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"Some files with lots of data (more than 1000 samples, sometimes a few million) regarding certain phenomena.\n\nThese files are used for benchmarking my [statistics calculator](https://github.com/Icelk/std-dev/).\n\n# Data\n\n-   `air-pressure`, air pressure (hPa) at altitudes (m)\n    -   Fits an exponential function.\n    -   5 638 790 samples.\n    -   Taken from [a GitHub repo](https://github.com/stsievert/air-pressure-heights/blob/master/noaa-igra-monthly-avgs.zip)\n        (I'm using the file `noaa-igra-monthly-avgs/ghgt_12z-mly.txt`).\n    -   There are no negative values in this set.\n\n# Usage\n\nThe data is stored in tar archives with zstd compression.\n\nUse the `./extract` script to print the data to stdout.\nYou can then pipe it into a program (e.g. `std-dev` for regression, `xsel -b` to copy it) or\npipe it to a file.\n\n# Creating the data\n\nIf you've got a text file with loads of data, I suggest using the following strategy to format it.\n\nThe text in square brackets `[]` should be replaced by you.\nTo extract columns, use `cat [your file] | awk '{print $[column number, starting from 1]}' \u003e first-column-of-data`.\nTo limit output, add `| head -n[number of lines]` before the pipe to awk.\nTo then combine two columns again, run `paste [filename of first column] [filename of second column] \u003e data-output`.\n\nTo then compress, use `tar -caf compressed.tzst data-output`. The file extension `.tzst` is important for tar to compress it using Zstd.\nYou can naturally then rename the file and change the file extension.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ficelk%2Fdata-samples-heavy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ficelk%2Fdata-samples-heavy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ficelk%2Fdata-samples-heavy/lists"}