{"id":31108750,"url":"https://github.com/microsoft/ML4C","last_synced_at":"2025-09-17T06:45:47.958Z","repository":{"id":65975816,"uuid":"541456472","full_name":"microsoft/ML4C","owner":"microsoft","description":"[SDM'23] ML4C: Seeing Causality Through Latent Vicinity","archived":false,"fork":false,"pushed_at":"2022-11-09T01:44:36.000Z","size":290,"stargazers_count":12,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-09-16T00:42:00.092Z","etag":null,"topics":["causal","causal-discovery","causality","causality-algorithms","discrete","graph","structure-learning","supervised-causal-learning","supervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null}},"created_at":"2022-09-26T07:18:34.000Z","updated_at":"2024-10-03T02:09:48.000Z","dependencies_parsed_at":"2023-02-19T19:31:16.458Z","dependency_job_id":null,"html_url":"https://github.com/microsoft/ML4C","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/microsoft/ML4C","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FML4C","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FML4C/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FML4C/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FML4C/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microsoft","download_url":"https://codeload.github.com/microsoft/ML4C/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FML4C/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275549023,"owners_count":25484678,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-17T02:00:09.119Z","response_time":84,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["causal","causal-discovery","causality","causality-algorithms","discrete","graph","structure-learning","supervised-causal-learning","supervised-learning"],"created_at":"2025-09-17T06:45:24.224Z","updated_at":"2025-09-17T06:45:47.922Z","avatar_url":"https://github.com/microsoft.png","language":"Python","funding_links":[],"categories":["🚀 GitHub Repositories"],"sub_categories":["🌟 **Real-World Magic**"],"readme":"# ML4C: Seeing Causality Through Latent Vicinity\n\nML4C (Machine Learning for Causality) is a **supervised** causal discovery approach on **observational** data (and currently only supports **discrete** data) with theoretical guarantee. Starting from an input dataset with the corresponding skeleton provided, ML4C classifies (orients) whether each unshielded triple is a v-structure or not, and then outputs the corresponding CPDAG. Theoretically, ML4C is asymptotically correct by considering the graphical predicates in vicinity of each unshielded triple. Empirically, ML4C remarkably outperforms other state-of-the-art algorithms in terms of accuracy, reliability, robustness and tolerance. See our [paper](https://arxiv.org/abs/2110.00637) for more details.\n\n---\n\n## Basic Usage Example\n\n```bash\ncd Examples/\npython main.py\n```\n\nThis example orients a given skeleton. A simple call to `orient_skeleton` would work. Specifically, the arguments are:\n\n+ `datapath`: `str`. Path to the observational data records. Should end with `.npy`, with the array in shape `(n_samples, n_variables)`. Now we only support discrete data, so the entries of this data array must be integers.\n+ `skeletonpath`: `str`. Path to the provided skeleton's adjacency matrix. Should end with `.txt`, with the array in shape `(n_variables, n_variables)`. If `i--j` is in the skeleton, then `a[i, j] = a[j, i] = 1`, and otherwise `a[i, j] = a[j, i] = 0`. Note that, \n  + You may obtain the skeleton from data using standard algorithms (e.g., PC, GES, etc.), and then undirect all edges.\n  + Or alternatively, you may try out our [ML4S](https://www.microsoft.com/en-us/research/uploads/prod/2022/07/ML4S-camera-ready.pdf) with [code](https://github.com/microsoft/reliableAI/tree/main/causal-kit/ML4S).\n+ `savedir`: `str`. The directory to save the result CPDAG's adjacency matrix. If `a[i, j] = 1` and `a[j, i] = 0` then there is a directed edge `i-\u003ej`. If `a[i, j] = a[j, i] = 1` then there is an undirected edge `i--j`. Otherwise there is no edge between `i` and `j`.\n\n\n## Classifier Training\n\nIn this repository, we provide a pre-trained classifier `./Learner/ML4C_learner.model`. To reproduce this classifier, you may run the following steps:\n\n```bash\ncd Learner/\npython SynthesizeData.py      # Generate synthetic graph strutures and data records\npython GenerateFeatures.py    # Generate features for each unshielded triple, based on the vicinity information\n```\n\nThen train a supervised classifier using whatever framework you like (e.g., here we use [XGBoost](https://xgboost.readthedocs.io/en/stable/)). Note that if you would like to customize your own classifier based on your own synthetic data (e.g., for continuous case), you may also follow the steps above.\n\n---\n\n## Contributing\n\nThis project welcomes contributions and suggestions.  Most contributions require you to agree to a\nContributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us\nthe rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.\n\nWhen you submit a pull request, a CLA bot will automatically determine whether you need to provide\na CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions\nprovided by the bot. You will only need to do this once across all repos using our CLA.\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).\nFor more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or\ncontact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\n\n## Trademarks\n\nThis project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft \ntrademarks or logos is subject to and must follow \n[Microsoft's Trademark \u0026 Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).\nUse of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.\nAny use of third-party trademarks or logos are subject to those third-party's policies.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2FML4C","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrosoft%2FML4C","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2FML4C/lists"}