{"id":21711259,"url":"https://github.com/distributedsystemsgroup/bleach","last_synced_at":"2025-09-22T23:46:08.539Z","repository":{"id":85177376,"uuid":"83581996","full_name":"DistributedSystemsGroup/Bleach","owner":"DistributedSystemsGroup","description":"A distributed stream data cleaning system","archived":false,"fork":false,"pushed_at":"2017-03-01T21:33:51.000Z","size":61,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-02-17T17:42:15.932Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DistributedSystemsGroup.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-03-01T17:29:04.000Z","updated_at":"2024-10-14T13:49:28.000Z","dependencies_parsed_at":"2023-03-07T13:00:36.200Z","dependency_job_id":null,"html_url":"https://github.com/DistributedSystemsGroup/Bleach","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DistributedSystemsGroup%2FBleach","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DistributedSystemsGroup%2FBleach/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DistributedSystemsGroup%2FBleach/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DistributedSystemsGroup%2FBleach/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DistributedSystemsGroup","download_url":"https://codeload.github.com/DistributedSystemsGroup/Bleach/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244666585,"owners_count":20490287,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-25T23:20:48.852Z","updated_at":"2025-09-22T23:46:03.470Z","avatar_url":"https://github.com/DistributedSystemsGroup.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Bleach\n\nBleach is distributed stream data cleaning system built on Apache Storm. Unlike other data cleaning systems \nwhich mainly focus on batch data cleaning, Bleach performs data cleaning directly on data streams without \nwaiting for all the data to be acquired. It aims to achieve efficient and accurate qualitative data cleaning \nunder real-time constraints. It currently support FD rules and CFD rules. More details can be found in our\n[paper](https://arxiv.org/abs/1609.05113#).\n\n\n## How to run it\n\nFirst, you need a cluster of machines in which Storm, Kafka and Zookeeper are installed. Next, download Bleach code and compile it by mvn:\n\n    $ git clone git://github.com:ychtian/Bleach\n    $ cd Bleach \u0026\u0026 mvn assembly:assembly\n\nThen, submit the jar to Storm cluster to start Bleach:\n\n    $ storm jar target/bleach-1.0.0-jar-with-dependencies.jar storm.dataclean.TestTopology.TestRepair -config job.conf\n\nAll the configuration is included in file job.conf.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdistributedsystemsgroup%2Fbleach","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdistributedsystemsgroup%2Fbleach","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdistributedsystemsgroup%2Fbleach/lists"}