{"id":15517072,"url":"https://github.com/deric/handl-data-generators","last_synced_at":"2025-04-04T10:26:59.309Z","repository":{"id":32716282,"uuid":"36305976","full_name":"deric/handl-data-generators","owner":"deric","description":"Clustering data generators","archived":false,"fork":false,"pushed_at":"2015-11-23T12:33:35.000Z","size":1190,"stargazers_count":3,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-09T20:42:19.842Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/deric.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-05-26T15:39:08.000Z","updated_at":"2017-10-22T12:01:00.000Z","dependencies_parsed_at":"2022-09-11T10:01:20.989Z","dependency_job_id":null,"html_url":"https://github.com/deric/handl-data-generators","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deric%2Fhandl-data-generators","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deric%2Fhandl-data-generators/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deric%2Fhandl-data-generators/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deric%2Fhandl-data-generators/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/deric","download_url":"https://codeload.github.com/deric/handl-data-generators/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247159382,"owners_count":20893603,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-02T10:11:13.419Z","updated_at":"2025-04-04T10:26:59.289Z","avatar_url":"https://github.com/deric.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# J. Handl: data generators\n\nSyntetic data generators for creating datasets with Gaussian distribution. The code was taken from [official website](http://personalpages.manchester.ac.uk/mbs/Julia.Handl/generators.html) and slightly modified for modern compilers.\n\n## Requirements\n\nYou'll need `g++` compiler.\n\n  * Debian: `apt-get install build-essential` should be enough\n\nand run\n\n```\n$ make\n```\n\n## mult_generator\n\nCurrently does not take any parameters, all settings is hard-coded in constants:\n\n```c\n#define DIM 2        // dimensionality of the data\n#define NUM 40      // number of clusters\n\n#define MAXMU 10    // mean in each dimension is in range [0,MAXMU]\n#define MINMU -10\n#define MINSIGMA 0\n#define MAXSIGMA 20*sqrt(DIM) // standard deviation (to be added on top\n// of row sum in each dimension is in range [0,MAXSIGMA]\n#define MAXSIZE 100  // size of each cluster is in range [MINSIZE,MAXSIZE]\n#define MINSIZE 10\n#define RUNS 10      // number of data sets to be generated\n```\n\nsimply run:\n\n```\n$ ./mult_generator\n```\n\n![mult_generator](https://raw.githubusercontent.com/deric/handl-data-generators/screens/img/2d-4c-no9.png)\n\n## elly\n\nEllipsoid generator\n\n```\n$ ./elly [-k \u003cnclust\u003e] [-d \u003cdimension\u003e] [-s \u003cseed\u003e]\n```\nwhere all parameters are optional and:\n  * `\u003cnclust\u003e` is a positive int \u003e= 2\n  * `\u003cdimension\u003e` is a positive int \u003e= 2\n  * `\u003cseed\u003e` is a long int.\n\n\n![elly example](https://raw.githubusercontent.com/deric/handl-data-generators/screens/img/elly-2d10c13s.png)\n\n## cure\n\nCURE data sets generator. See Guha, Sudipto, Rajeev Rastogi, and Kyuseok Shim. \"CURE: an efficient clustering algorithm for\nlarge databases.\" ACM SIGMOD Record. Vol. 27. No. 2. ACM, 1998. for more details.\n\nThe distribution of data points is just approximated\n\n```\n$ ./cure -n \u003cnpoints\u003e [-d \u003cdimension\u003e] [-s \u003cseed\u003e] [-l \u003cx/y min\u003e] [-m \u003cx/y max\u003e] [-t type of data]\n```\nwhere:\n  * `-l` minimal x/y value\n  * `-m` maximal x/y value\n  * `-t` type of dataset, currently supports values 0-2\n\n![cure t0](https://raw.githubusercontent.com/deric/handl-data-generators/screens/img/cure-t0-2k-2d.png)\n\n![cure t1](https://raw.githubusercontent.com/deric/handl-data-generators/screens/img/cure-t1-2k-2d.png)\n\n![cure t2](https://raw.githubusercontent.com/deric/handl-data-generators/screens/img/cure-t2-4k.png)\n\n## disk\n\nDisk in disk dataset are two clusters formed by a circle and an annulus around it.\n\n\n![disk-4600](https://raw.githubusercontent.com/deric/handl-data-generators/screens/img/disk-4600.png)\n\n## Authors\n\n  * Julia Handl\n  * Joshua Knowles\n  * John Burkardt\n  * Tomas Barton\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fderic%2Fhandl-data-generators","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fderic%2Fhandl-data-generators","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fderic%2Fhandl-data-generators/lists"}