{"id":17456621,"url":"https://github.com/martinthoma/hasy","last_synced_at":"2025-07-07T18:33:56.323Z","repository":{"id":37784808,"uuid":"79312964","full_name":"MartinThoma/HASY","owner":"MartinThoma","description":"HASY dataset","archived":false,"fork":false,"pushed_at":"2023-10-03T22:37:57.000Z","size":3301,"stargazers_count":34,"open_issues_count":21,"forks_count":11,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-28T10:50:18.533Z","etag":null,"topics":["dataset","machine-learning","ocr","optical-character-recognition","science","symbols"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MartinThoma.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-01-18T07:04:01.000Z","updated_at":"2025-01-20T00:07:24.000Z","dependencies_parsed_at":"2024-10-20T19:11:59.250Z","dependency_job_id":null,"html_url":"https://github.com/MartinThoma/HASY","commit_stats":{"total_commits":146,"total_committers":3,"mean_commits":"48.666666666666664","dds":0.0547945205479452,"last_synced_commit":"6703be918a9369654c9699e5ca7df617d4a38f2c"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MartinThoma%2FHASY","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MartinThoma%2FHASY/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MartinThoma%2FHASY/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MartinThoma%2FHASY/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MartinThoma","download_url":"https://codeload.github.com/MartinThoma/HASY/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248974662,"owners_count":21192186,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset","machine-learning","ocr","optical-character-recognition","science","symbols"],"created_at":"2024-10-18T02:48:27.830Z","updated_at":"2025-04-14T22:36:53.180Z","avatar_url":"https://github.com/MartinThoma.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![PyPI version](https://badge.fury.io/py/hasy.svg)](https://badge.fury.io/py/hasy)\n[![Python Support](https://img.shields.io/pypi/pyversions/hasy.svg)](https://pypi.org/project/hasy/)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n![GitHub last commit](https://img.shields.io/github/last-commit/MartinThoma/HASY)\n![GitHub commits since latest release (by SemVer)](https://img.shields.io/github/commits-since/MartinThoma/HASY/0.3.1)\n[![CodeFactor](https://www.codefactor.io/repository/github/martinthoma/HASY/badge/master)](https://www.codefactor.io/repository/github/martinthoma/HASY/overview/master)\n\nPlease refer to the [HASY paper](https://arxiv.org/abs/1701.08380) for details\nabout the dataset. If you want to report problems of the HASY dataset, please\nsend an email to info@martin-thoma.de or file an issue at\nhttps://github.com/MartinThoma/HASY\n\nErrata are listed in the git repository as well as the actual `hasy` package.\n\n\n## Contents\n\nThe contents of the [HASYv2 dataset](https://zenodo.org/record/259444) are:\n\n* `hasy-data`: 168236 png images, each 32px x 32px\n* `hasy-data-labels.csv`: Labels for all images.\n* `classification-task`: 10 folders (fold-1, fold-2, ..., fold-10) which\n  contain a `train.csv` and a `test.csv` each. Every line of the csv files\n  points to one of the png images (relative to itself). If those files are\n  used, then the `hasy-data-labels.csv` is not necessary.\n* `verification-task`: A `train.csv` and three different test files. All files\n  should be used in exactly the same way, but the accuracy should be reported\n  for each one.\n  The task is to decide for a pair of two 32px x 32px images if they belong\n  to the same symbol (binary classification).\n* `symbols.csv`: All classes\n* `README.txt`: This file\n\n\n## How to evaluate\n\n### Classification Task\n\nUse the pre-defined 10 folds for 10-fold cross-validation. Report the\naverage accuracy as well as the minumum and maximum accuracy.\n\n\n### Verification Task\n\nUse the `train.csv` for training. Use `test-v1.csv`, test-v2.csv`,\n`test-v3.csv` for evaluation. Report TP, TN, FP, FN and accuracy for each\nof the three test groups.\n\n\n## hasy package\n\n`hasy` can be used in two ways: (1) as a shell script (2) as a Python\nmodule.\n\nIf you want to get more information about the shell script options, execute\n\n```\n$ hasy --help\nusage: hasy [-h] [--dataset DATASET] [--verify] [--overview] [--analyze_color]\n            [--class_distribution] [--distances] [--pca] [--variance]\n            [--correlation] [--count-users] [--analyze-cm CM]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --dataset DATASET     specify which data to use (default: None)\n  --verify              verify PNG files (default: False)\n  --overview            Get overview of data (default: False)\n  --analyze_color       Analyze the color distribution (default: False)\n  --class_distribution  Analyze the class distribution (default: False)\n  --distances           Analyze the euclidean distance distribution (default:\n                        False)\n  --pca                 Show how many principal components explain 90% / 95% /\n                        99% of the variance (default: False)\n  --variance            Analyze the variance of features (default: False)\n  --correlation         Analyze the correlation of features (default: False)\n  --count-users         Count how many different users have created the\n                        dataset (default: False)\n  --analyze-cm CM       Analyze a confusion matrix in JSON format. (default:\n                        False)\n```\n\n\nIf you want to use `hasy` as a Python package, see\n\n    python -c \"import hasy.hasy_tools;help(hasy.hasy_tools)\"\n\n\n## Changelog\n\n* 14.05.2020, hasy Python package: Major refactoring of this repository\n* 24.01.2017, HASYv2: Points were not rendered in HASYv1; improved hasy_tools\n                      https://doi.org/10.5281/zenodo.259444\n* 18.01.2017, HASYv1: Initial upload\n                      https://doi.org/10.5281/zenodo.250239\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmartinthoma%2Fhasy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmartinthoma%2Fhasy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmartinthoma%2Fhasy/lists"}