{"id":15893315,"url":"https://github.com/nunofachada/generatedata","last_synced_at":"2025-08-14T07:31:32.320Z","repository":{"id":16504032,"uuid":"19256939","full_name":"nunofachada/generateData","owner":"nunofachada","description":"Generates 2D data clusters","archived":false,"fork":false,"pushed_at":"2023-01-26T17:48:21.000Z","size":36,"stargazers_count":2,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-02T14:22:06.650Z","etag":null,"topics":["axis","center","clustering","dataset","dataset-generation","datasets","distance","matlab","octave","octave-functions","octave-scripts","slope","totalpoints"],"latest_commit_sha":null,"homepage":"","language":"MATLAB","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nunofachada.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-04-28T23:05:22.000Z","updated_at":"2023-01-26T17:46:13.000Z","dependencies_parsed_at":"2023-02-14T20:16:01.271Z","dependency_job_id":null,"html_url":"https://github.com/nunofachada/generateData","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/nunofachada/generateData","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nunofachada%2FgenerateData","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nunofachada%2FgenerateData/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nunofachada%2FgenerateData/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nunofachada%2FgenerateData/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nunofachada","download_url":"https://codeload.github.com/nunofachada/generateData/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nunofachada%2FgenerateData/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270384116,"owners_count":24574526,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-14T02:00:10.309Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["axis","center","clustering","dataset","dataset-generation","datasets","distance","matlab","octave","octave-functions","octave-scripts","slope","totalpoints"],"created_at":"2024-10-06T08:09:53.821Z","updated_at":"2025-08-14T07:31:32.076Z","avatar_url":"https://github.com/nunofachada.png","language":"MATLAB","readme":"[![Latest release](https://img.shields.io/github/release/fakenmc/generateData.svg)](https://github.com/fakenmc/generateData/releases)\n[![MIT Licence](https://img.shields.io/badge/license-MIT-yellowgreen.svg)](https://opensource.org/licenses/MIT/)\n[![View Generate Data for Clustering on File Exchange](https://www.mathworks.com/matlabcentral/images/matlab-file-exchange.svg)](https://www.mathworks.com/matlabcentral/fileexchange/37435-generate-data-for-clustering)\n\n# generateData\n\n## Summary\n\nA MATLAB/Octave function which generates 2D data clusters. Data is\ncreated along straight lines, which can be more or less parallel\ndepending on the selected input parameters.\n\n## Synopsis\n\n```MATLAB\n[data, clustPoints, idx, centers, angles, lengths] = ...\n    generateData(angleMean, angleStd, numClusts, xClustAvgSep, yClustAvgSep, ...\n                 lengthMean, lengthStd, lateralStd, totalPoints, ...)\n```\n\n## Input parameters\n\n### Required parameters\n\nParameter      | Description\n-------------- | -----------\n`angleMean`    | Mean angle in radians of the lines on which clusters are based. Angles are drawn from the normal distribution.\n`angleStd`     | Standard deviation of line angles.\n`numClusts`    | Number of clusters (and therefore of lines) to generate.\n`xClustAvgSep` | Average separation of line centers along the X axis.\n`yClustAvgSep` | Average separation of line centers along the Y axis.\n`lengthMean`   | Mean length of the lines on which clusters are based. Line lengths are drawn from the folded normal distribution.\n`lengthStd`    | Standard deviation of line lengths.\n`lateralStd`   | Cluster \"fatness\", i.e., the standard deviation of the distance from each point to its projection on the line. The way this distance is obtained is controlled by the optional `'pointOffset'` parameter.\n`totalPoints`  | Total points in generated data. These will be randomly divided between clusters using the half-normal distribution with unit standard deviation.\n\n### Optional named parameters\n\nParameter name | Parameter values   | Default value | Description\n-------------- | ---------------------------------- | ------------- | -----------\n`allowEmpty`   | `true`, `false`    | `false`       | Allow empty clusters?\n`pointDist`    | `'unif'`, `'norm'` | `unif`        | Specifies the distribution of points along lines, with two possible values: 1) `'unif'` distributes points uniformly along lines; or, 2) `'norm'` distribute points along lines using a normal distribution (line center is the mean and the line length is equal to 3 standard deviations).\n`pointOffset`  | `1D`, `2D`         | `2D`          | Controls how points are created from their projections on the lines, with two possible values: 1) `'1D'` places points on a second line perpendicular to the cluster line using a normal distribution centered at their intersection; or, 2) `'2D'` places point using a bivariate normal distribution centered at the point projection.\n\n## Return values\n\n  Value         | Description\n  ------------- | --------------------------------------------------------------------------------------\n  `data`        | Matrix (`totalPoints` x *2*) with the generated data.\n  `clustPoints` | Vector (`numClusts` x *1*) containing number of points in each cluster.\n  `idx`         | Vector (`totalPoints` x *1*) containing the cluster indices of each point.\n  `centers`     | Matrix (`numClusts` x *2*) containing line centers from where clusters were generated.\n  `angles`      | Vector (`numClusts` x *1*) containing the effective angles of the lines used to generate clusters.\n  `lengths`     | Vector (`numClusts` x *1*) containing the effective lengths of the lines used to generate clusters.\n\n## Usage examples\n\n### Basic usage\n\n```MATLAB\n[data cp idx] = generateData(pi / 2, pi / 8, 5, 15, 15, 5, 1, 2, 200);\n```\n\nThe previous command creates 5 clusters with a total of 200 points, with\na mean angle of π/2 (*std*=π/8), separated in average by 15 units in both\n*x* and *y* directions, with mean length of 5 units (*std*=1) and a\n\"fatness\" or spread of 2 units.\n\nThe following command plots the generated clusters:\n\n```MATLAB\nscatter(data(:, 1), data(:, 2), 8, idx);\n```\n\n### Using optional parameters\n\nThe following command generates 7 clusters with a total of 100 000 points.\nOptional parameters are used to override the defaults.\n\n```MATLAB\n[data cp idx] = generateData(0, pi / 16, 7, 25, 25, 25, 5, 1, 100000, ...\n  'pointDist', 'norm', 'pointOffset', '1D', 'allowEmpty', true);\n```\n\nThe generated clusters can be visualized with the same `scatter` command used\nin the previous example.\n\n### Reproducible cluster generation\n\nTo make cluster generation reproducible, set the random number generator seed\nto a specific value (e.g. 123) before generating the data:\n\n```MATLAB\nrng(123);\n```\n\nFor GNU Octave, use the following instructions instead:\n\n```MATLAB\nrand(\"state\", 123);\nrandn(\"state\", 123);\n```\n\n## Previous behaviors and reproducibility of results\n\nBefore [v2.0.0](https://github.com/fakenmc/generateData/tree/v2.0.0), lines\nsupporting clusters were parameterized with slopes instead of angles. We found\nthis caused difficulties when choosing line orientation, thus the change to\nangles, which are much easier to work with.\nVersion [v1.3.0](https://github.com/fakenmc/generateData/tree/v1.3.0) still\nuses slopes, for those who prefer this behavior.\n\nFor reproducing results in studies published before May 2020, use version\n[v1.2.0](https://github.com/fakenmc/generateData/tree/v1.2.0) instead.\nSubsequent versions were optimized in a way that changed the order in which\nthe required random values are generated, thus producing slightly different\nresults.\n\n## Reference\n\nIf you use this function in your work, please cite the following reference:\n\n- Fachada, N., \u0026 Rosa, A. C. (2020).\n[generateData—A 2D data generator](https://doi.org/10.1016/j.simpa.2020.100017).\nSoftware Impacts, 4:100017. doi: [10.1016/j.simpa.2020.100017](https://doi.org/10.1016/j.simpa.2020.100017)\n\n## Multidimensional alternative\n\nThe [*MOCluGen*](https://github.com/clugen/MOCluGen) toolbox extends\n*generateData* with arbitrary dimensions and statistical distributions.\nTherefore, *generateData* offers a limited subset of the functionality provided\nby *MOCluGen*, although it's probably simpler to use.\n\n## License\n\nThis script is made available under the [MIT License](LICENSE).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnunofachada%2Fgeneratedata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnunofachada%2Fgeneratedata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnunofachada%2Fgeneratedata/lists"}