{"id":20207002,"url":"https://github.com/koolreport/cleandata","last_synced_at":"2025-09-06T05:48:04.090Z","repository":{"id":57008487,"uuid":"185561557","full_name":"koolreport/cleandata","owner":"koolreport","description":"Make your data clean before making report","archived":false,"fork":false,"pushed_at":"2023-03-31T08:06:59.000Z","size":7,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-24T11:13:11.366Z","etag":null,"topics":["data-clean","data-cleaning","mysql-reporting-tools","php-reporting-tools","reporting-engine"],"latest_commit_sha":null,"homepage":"https://www.koolreport.com/","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/koolreport.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-05-08T08:09:41.000Z","updated_at":"2023-03-06T06:45:50.000Z","dependencies_parsed_at":"2022-08-21T14:50:46.997Z","dependency_job_id":null,"html_url":"https://github.com/koolreport/cleandata","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/koolreport%2Fcleandata","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/koolreport%2Fcleandata/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/koolreport%2Fcleandata/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/koolreport%2Fcleandata/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/koolreport","download_url":"https://codeload.github.com/koolreport/cleandata/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248217131,"owners_count":21066633,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-clean","data-cleaning","mysql-reporting-tools","php-reporting-tools","reporting-engine"],"created_at":"2024-11-14T05:27:05.688Z","updated_at":"2025-04-10T12:33:16.674Z","avatar_url":"https://github.com/koolreport.png","language":"PHP","readme":"# Introduction \n\nMissing data is always a problem with data analysis and data mining. The `cleandata` package give you methods to solve this data missing issue.\n\n# Installation\n\n## By downloading .zip file\n\n1. [Download](https://www.koolreport.com/packages/cleandata)\n2. Unzip the zip file\n3. Copy the folder `cleandata` into `koolreport` folder so that look like below\n\n```bash\nkoolreport\n├── core\n├── cleandata\n```\n\n## By composer\n\n```\ncomposer require koolreport/cleandata\n```\n\n# Documentation\n\nThe missing value normally comes to KoolReport in form of `null` value. We solve this by either __drop the row__ or __fill new value for it__.\n\n## DropNull\n\nThe `DropNull` process will drop the row which has `null` value or meet certain number of `null` occurrences.\n\nLet look at an example:\n\n```\n$this-\u003esrc('db')\n-\u003equery(\"select * from customers\")\n-\u003epipe(new DropNull())\n-\u003epipe($this-\u003edataStore('clean_data'));\n```\n\nAbove is simplest example of using `DropNull` process. All the row which has `null` value will be dropped. As a result, return data will be those __customers__ with full informations.\n\n### Target a certain columns only\n\nSometime you only drop the row if some certain columns has `null` values:\n\n```\n-\u003epipe(new DropNull(array(\n    \"targetColumns\"=\u003earray(\"salary\",\"tax\")\n)))\n```\n\n### Exclude some columns\n\nIf you want to target all columns except some because it is not important, you do:\n\n```\n-\u003epipe(new DropNull(array(\n    \"excludedColumns\"=\u003earray(\"address\",\"city\")\n)))\n```\n\n### Target specific type of columns\n\nFor example, You can target `number` columns only, if any of those columns has `null` value, the row will be dropped:\n\n```\n-\u003epipe(new DropNull(array(\n    \"targetColumnType\"=\u003e\"number\"\n)))\n```\n\nYou can target to other column types which are `string`,`date`,`datetime`,`time`\n\n### Threshold\n\nFor example, if data row contains more than 2 `null` values, drop the row:\n\n```\n-\u003epipe(new DropNull(array(\n    \"thresh\"=\u003e3,\n)))\n```\n\n### Targeted value\n\nWhat if you do not want to drop `null` value but the `0` value. The missing data to you is the `0` value, you can do\n\n```\n-\u003epipe(new DropNull(array(\n    \"targetValue\"=\u003e0,\n)))\n```\n\nOf course, you can set any target values regardless number type or string type. The default value of `targetValue` is `null`.\n\n### Stricly Null\n\nBy default the the `null` could be empty string or `0` value. To enable strict comparison of both value and type, you set the following:\n\n```\n-\u003epipe(new DropNull(array(\n    \"strict\"=\u003etrue,\n)))\n```\n\n\n## FillNull\n\nThe `FillNull` value is another method of cleaning data. We do not drop row with `null` value, rather we fill `null` value with the new value.\n\n```\n-\u003epipe(new FillNull(array(\n    \"newValue\"=\u003e0\n)))\n```\n\nAbove code will fill all the `null` value with `10`.\n\n### Targeted value\n\nWhat if you want to target at `0` value, you can do:\"\n\n```\n-\u003epipe(new FillNull(array(\n    \"targetValue\"=\u003e0,\n    \"newValue\"=\u003e10,\n)))\n```\n\n### Fill missing value with MEDIAN and MEAN\n\nIn above example, we fill missing value with the value we want. However the better method is to fill them with mean or median of the column values. This solution seems more elegant. You can do:\n\n```\n-\u003epipe(new FillNull(array(\n    \"newValue\"=\u003eFillNull::MEAN,\n)))\n```\nFor median, you do\n\n```\n-\u003epipe(new FillNull(array(\n    \"newValue\"=\u003eFillNull::MEDIAN,\n)))\n```\n\n### Target some specific columns\n\nYou can apply fulling action to some of specified columns:\n\n```\n-\u003epipe(new FillNull(array(\n    \"targetColumns\"=\u003earray(\"salary\",\"tax\"),\n)))\n```\n\n### Exclude some columns\n\nSome columns are not important and missing value does not affect, you can do:\n\n```\n-\u003epipe(new FillNull(array(\n    \"excludedColumns\"=\u003earray(\"lastname\",\"gender\"),\n)))\n```\n\n### Target some specific column type\n\nIf you want you can apply the the fill to certain `number` columns:\n\n```\n-\u003epipe(new FillNull(array(\n    \"targetColumnType\"=\u003e\"number\"\n)))\n```\n\n### Strictly Null\n\nBy default the the `null` could be empty string or `0` value. To enable strict comparison of both value and type, you set the following:\n\n```\n-\u003epipe(new FillNull(array(\n    \"strict\"=\u003etrue,\n)))\n```\n\n\n## Support\n\n\nPlease use our forum if you need support, by this way other people can benefit as well. If the support request need privacy, you may send email to us at __support@koolreport.com__.","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkoolreport%2Fcleandata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkoolreport%2Fcleandata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkoolreport%2Fcleandata/lists"}