{"id":22443077,"url":"https://github.com/riversun/ml-fake-data-maker","last_synced_at":"2026-04-13T21:32:00.323Z","repository":{"id":57740070,"uuid":"198040692","full_name":"riversun/ml-fake-data-maker","owner":"riversun","description":"Generate fake data for machine learning like regression analysis","archived":false,"fork":false,"pushed_at":"2020-10-13T14:44:47.000Z","size":105,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-07-05T17:33:53.988Z","etag":null,"topics":["arff","arff-generator","dummy-data","fake-data","generator","machine-learning","prediction","regression","spark","weka"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/riversun.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-07-21T10:02:20.000Z","updated_at":"2023-10-01T01:59:54.000Z","dependencies_parsed_at":"2022-08-30T10:51:29.120Z","dependency_job_id":null,"html_url":"https://github.com/riversun/ml-fake-data-maker","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/riversun/ml-fake-data-maker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/riversun%2Fml-fake-data-maker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/riversun%2Fml-fake-data-maker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/riversun%2Fml-fake-data-maker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/riversun%2Fml-fake-data-maker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/riversun","download_url":"https://codeload.github.com/riversun/ml-fake-data-maker/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/riversun%2Fml-fake-data-maker/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31771816,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-13T20:17:16.280Z","status":"ssl_error","status_checked_at":"2026-04-13T20:17:08.216Z","response_time":93,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arff","arff-generator","dummy-data","fake-data","generator","machine-learning","prediction","regression","spark","weka"],"created_at":"2024-12-06T02:22:24.723Z","updated_at":"2026-04-13T21:32:00.298Z","avatar_url":"https://github.com/riversun.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Overview\n- Library to generate fake data for machine learningu (like prediction with linearregression,randomforest,gbt etc.)\n- Support CSV and ARFF format\n\n\nIt is licensed under [MIT](https://opensource.org/licenses/MIT).\n\n# How To Use\n\n**Example Code**\n\nGenerate fake data that can be used in linear regression (or something like that).\n\n**Maven dependency**\n\n```xml\n\u003cdependency\u003e\n\u003cgroupId\u003eorg.riversun\u003c/groupId\u003e\n\u003cartifactId\u003efake-data-maker\u003c/artifactId\u003e\n\u003cversion\u003e1.1.0\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n**Example code**\n\nGenerate fake data as **CSV** data\n\n```java\n\n\nimport java.io.File;\n\n/**\n *\n * Generate fake data that can be used in linear regression analysis\n *\n */\nclass Example {\n\n    public static void main(String[] args) {\n\n        // Set base price\n        double basePrice = 10;\n\n        // Create attributes\n        Attribute material = new Attribute(\n                // category name\n                \"material\",\n                // new AttributeNominal(categorical value,Weight given to objective variable)\n                new AttributeNominal(\"Diamond\", 20),\n                new AttributeNominal(\"Platinum\", 15),\n                new AttributeNominal(\"Gold\", 10),\n                new AttributeNominal(\"Silver\", 3));\n\n        Attribute brand = new Attribute(\n                \"brand\",\n                new AttributeNominal(\"WorldTopBrand\", 8.0),\n                new AttributeNominal(\"FamouseBrand\", 4.5),\n                new AttributeNominal(\"NationalBrand\", 2.0),\n                new AttributeNominal(\"NoBrand\", 1.0));\n\n        Attribute shop = new Attribute(\n                \"shop\",\n                new AttributeNominal(\"BrandStore\", 1.7),\n                new AttributeNominal(\"DepartmentStore\", 1.5),\n                new AttributeNominal(\"MassRetailer\", 1.2),\n                new AttributeNominal(\"DiscountStore\", 1.1));\n\n        Attribute shape = new Attribute(\n                \"shape\",\n                new AttributeNominal(\"Ring\", 1.10),\n                new AttributeNominal(\"Neckless\", 1.07),\n                new AttributeNominal(\"Earrings\", 1.05),\n                new AttributeNominal(\"Brooch\", 1.05),\n                new AttributeNominal(\"Brace\", 1.15));\n\n        Attribute weightg = new Attribute(\"weight\",\n                new AttributeNumeric(10, 60, ComputeMethod.LOG10, 1));\n\n        FakeDataSet ds = new FakeDataSet.Builder()\n                .type(DataType.REGRESSION)\n                .outputFormat(OutputFormat.CSV)//CSV or ARFF\n                .nameOfData(\"gemsales\")\n                .addAttr(material)\n                .addAttr(shape)\n                .addAttr(weightg)\n                .addAttr(brand)\n                .addAttr(shop)\n                .compliantListener(new DataRuleCompliantListener() {\n                    @Override\n                    public boolean isCompliant(AttributeCheck check) {\n\n                        if (check.nominalEquals(\"brand\", \"NoBrand\") \u0026\u0026 check.nominalEquals(\"shop\", \"BrandStore\")) {\n                            // No-brand has no \"BrandStore\"\n                            return false;\n                        }\n                        if (check.nominalEquals(\"brand\", \"WorldTopBrand\") \u0026\u0026\n                                (check.nominalEquals(\"shop\", \"DiscountStore\")) || check.nominalEquals(\"shop\", \"MassRetailer\")) {\n                            // WorldTopBrands are not handled at \"DiscountStores\" or \"mass retailers\"\n                            return false;\n                        }\n                        if (check.nominalEquals(\"brand\", \"FamouseBrand\") \u0026\u0026\n                                (check.nominalEquals(\"shop\", \"DiscountStore\"))) {\n                            // FamouseBrands are not handled at \"DiscountStore\"\n                            return false;\n                        }\n\n                        return true;\n                    }\n                })\n                .numOfLines(20)//num of data\n                .targetLabel(\"price\")//target label\n                .targetInitialValue(basePrice)\n                .valueVolatility(0.0)\n                .withHeader(true)\n                .withId(true)\n                .build();\n\n        //ds.save(new File(\"c:/temp/data.arff\"), \"UTF-8\");//save generated data\n        System.out.println(ds.get());//print generated data\n\n    }\n}\n```\n\n**Result**\nYou can get like this CSV file.\n\nIn the case of this data,\nit is possible to predict the price of the gem (objective variable ) from the attribute (explanatory variable) using linear regression etc.\n\n\n```shell\nid,material,shape,weight,brand,shop,price\n0,Gold,Brace,42,NationalBrand,DepartmentStore,561\n1,Gold,Earrings,51,NoBrand,DepartmentStore,269\n2,Silver,Ring,49,WorldTopBrand,DepartmentStore,672\n3,Gold,Earrings,43,WorldTopBrand,BrandStore,2337\n4,Platinum,Ring,10,FamouseBrand,BrandStore,1279\n5,Diamond,Neckless,42,NoBrand,DiscountStore,383\n6,Gold,Earrings,13,WorldTopBrand,BrandStore,1603\n7,Gold,Brace,59,FamouseBrand,BrandStore,1558\n8,Platinum,Earrings,47,WorldTopBrand,DepartmentStore,3173\n9,Silver,Ring,38,NationalBrand,DepartmentStore,156\n10,Diamond,Neckless,25,FamouseBrand,BrandStore,2299\n11,Platinum,Ring,34,FamouseBrand,BrandStore,1940\n12,Gold,Brooch,21,NoBrand,DiscountStore,154\n13,Gold,Earrings,18,WorldTopBrand,DepartmentStore,1607\n14,Gold,Earrings,35,NoBrand,DiscountStore,178\n15,Platinum,Ring,37,NationalBrand,BrandStore,881\n16,Silver,Brooch,39,NoBrand,DiscountStore,55\n17,Gold,Earrings,43,NationalBrand,DepartmentStore,514\n18,Silver,Brace,35,FamouseBrand,BrandStore,409\n19,Platinum,Brooch,13,NationalBrand,DiscountStore,393\n20,Gold,Earrings,53,NationalBrand,DiscountStore,400\n```\n\n**Example code**\n\nGenerate fake data as **ARFF** data\n\n```java\nimport java.io.File;\n\n/**\n *\n * Generate fake data that can be used in linear regression analysis\n *\n */\nclass _ExampleEn {\n\n    public static void main(String[] args) {\n\n        // Set base price\n        double basePrice = 10;\n\n        // Create attributes\n        Attribute material = new Attribute(\n                // category name\n                \"material\",\n                // new AttributeNominal(categorical value,Weight given to objective variable)\n                new AttributeNominal(\"Diamond\", 20),\n                new AttributeNominal(\"Platinum\", 15),\n                new AttributeNominal(\"Gold\", 10),\n                new AttributeNominal(\"Silver\", 3));\n\n        Attribute brand = new Attribute(\n                \"brand\",\n                new AttributeNominal(\"WorldTopBrand\", 8.0),\n                new AttributeNominal(\"FamouseBrand\", 4.5),\n                new AttributeNominal(\"NationalBrand\", 2.0),\n                new AttributeNominal(\"NoBrand\", 1.0));\n\n        Attribute shop = new Attribute(\n                \"shop\",\n                new AttributeNominal(\"BrandStore\", 1.7),\n                new AttributeNominal(\"DepartmentStore\", 1.5),\n                new AttributeNominal(\"MassRetailer\", 1.2),\n                new AttributeNominal(\"DiscountStore\", 1.1));\n\n        Attribute shape = new Attribute(\n                \"shape\",\n                new AttributeNominal(\"Ring\", 1.10),\n                new AttributeNominal(\"Neckless\", 1.07),\n                new AttributeNominal(\"Earrings\", 1.05),\n                new AttributeNominal(\"Brooch\", 1.05),\n                new AttributeNominal(\"Brace\", 1.15));\n\n        Attribute weightg = new Attribute(\"weight\",\n                new AttributeNumeric(10, 60, ComputeMethod.LOG10, 1));\n\n        FakeDataSet ds = new FakeDataSet.Builder()\n                .type(DataType.REGRESSION)\n                .outputFormat(OutputFormat.ARFF)//CSV or ARFF\n                .nameOfData(\"gemsales\")\n                .addAttr(material)\n                .addAttr(shape)\n                .addAttr(weightg)\n                .addAttr(brand)\n                .addAttr(shop)\n                .compliantListener(new DataRuleCompliantListener() {\n                    @Override\n                    public boolean isCompliant(AttributeCheck check) {\n\n                        if (check.nominalEquals(\"brand\", \"NoBrand\") \u0026\u0026 check.nominalEquals(\"shop\", \"BrandStore\")) {\n                            // No-brand has no \"BrandStore\"\n                            return false;\n                        }\n                        if (check.nominalEquals(\"brand\", \"WorldTopBrand\") \u0026\u0026\n                                (check.nominalEquals(\"shop\", \"DiscountStore\")) || check.nominalEquals(\"shop\", \"MassRetailer\")) {\n                            // WorldTopBrands are not handled at \"DiscountStores\" or \"mass retailers\"\n                            return false;\n                        }\n                        if (check.nominalEquals(\"brand\", \"FamouseBrand\") \u0026\u0026\n                                (check.nominalEquals(\"shop\", \"DiscountStore\"))) {\n                            // FamouseBrands are not handled at \"DiscountStore\"\n                            return false;\n                        }\n\n                        return true;\n                    }\n                })\n                .numOfLines(20)//num of data\n                .targetLabel(\"price\")//target label\n                .targetInitialValue(basePrice)\n                .valueVolatility(0.0)\n                .withHeader(true)\n                .withId(true)\n                .build();\n\n        //ds.save(new File(\"c:/temp/data.arff\"), \"UTF-8\");//save generated data\n        System.out.println(ds.get());//print generated data\n\n    }\n```\n\n\n# **Download example data for learning and regression**\nData file is also MIT licensed.\n\n# **Gem Prices**\n\n**Purpose**\nPredict sales price from gem attributes\n\n**Format**\n\n```\nid,material,shape,weight,brand,shop,price\n0,Gold,Brace,42,NationalBrand,DepartmentStore,561\n1,Gold,Earrings,51,NoBrand,DepartmentStore,269\n...\n```\n\n**Data File**\n\n- [**CSV File(EN)**](https://raw.githubusercontent.com/riversun/ml-fake-data-maker/master/datasets/gem_price.csv)\n- [**CSV File(JA) UTF-8 with BOM**](https://raw.githubusercontent.com/riversun/ml-fake-data-maker/master/datasets/gem_price_ja.csv)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Friversun%2Fml-fake-data-maker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Friversun%2Fml-fake-data-maker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Friversun%2Fml-fake-data-maker/lists"}