{"id":18702271,"url":"https://github.com/justinshenk/learnthestructure","last_synced_at":"2025-11-09T01:30:28.144Z","repository":{"id":68948244,"uuid":"70152841","full_name":"JustinShenk/learnthestructure","owner":"JustinShenk","description":"Implementation of libpgm for the Wisconsin Breast Cancer Database","archived":false,"fork":false,"pushed_at":"2016-10-09T23:41:13.000Z","size":207,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-12-28T05:43:06.175Z","etag":null,"topics":["bayesian-network","python"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JustinShenk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-10-06T12:42:09.000Z","updated_at":"2019-06-22T15:10:17.000Z","dependencies_parsed_at":"2023-09-14T14:49:44.420Z","dependency_job_id":null,"html_url":"https://github.com/JustinShenk/learnthestructure","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JustinShenk%2Flearnthestructure","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JustinShenk%2Flearnthestructure/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JustinShenk%2Flearnthestructure/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JustinShenk%2Flearnthestructure/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JustinShenk","download_url":"https://codeload.github.com/JustinShenk/learnthestructure/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239565644,"owners_count":19660154,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bayesian-network","python"],"created_at":"2024-11-07T11:45:21.775Z","updated_at":"2025-11-09T01:30:28.079Z","avatar_url":"https://github.com/JustinShenk.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Bayesian Network Structure Learning with Breast Cancer Data\n\nImplementation of [libpgm](https://github.com/CyberPoint/libpgm) to estimate the Bayesian Network structure of the [Wisconsin Breast Cancer Database](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original)). This code was written by Marc Vidal and Justin Shenk for Nico Potyka's \"Basic Methods in Probabilistic Reasoning\" seminar.\n\n# Project structure\nPython script is located in `libpgm-1.3/implementation/learnthestructure.py`.\n\nThe original data is located in `libpgm-1.3/data/breast-cancer-wisconsin.data`.\n\n# Getting started\nClone the repository\n\n`git clone https://github.com/JustinShenk/learnthestructure.git`\n\nInstall libpgm package\n\n`cd learnthestructure`\n\n`cd libpgm-1.3`\n\n`[sudo] python setup.py install`\n\nRun the script\n\n`cd implementation`\n\n`python learnthestructure.py [p-value parameter] [# of bins] [lg]`\n\n\nThe data is discrete rather than continuous. To test the binning of the linear Gaussian function, however, add `lg` as an argument and specify the number of bins. By default, p-value threshold is .05 and linear Gaussian data is discretized into 10 bins.\n\n# Data variables\n\n| *#* | Attribute | Domain |\n| --- | --- | --- |\n| 1. | Sample code number | id number |\n| 2. | Clump Thickness | 1 - 10 |\n| 3. | Uniformity of Cell Size  | 1 - 10 |\n| 4. | Uniformity of Cell Shape | 1 - 10 |\n| 5. | Marginal Adhesion | 1 - 10 |\n| 6. | Single Epithelial Cell Size  | 1 - 10 |\n| 7. | Bare Nuclei | 1 - 10 |\n| 8. | Bland Chromatin | 1 - 10 |\n| 9. | Normal Nucleoli  | 1 - 10 |\n| 10. | Mitoses | 1 - 10 |\n| 11. | Class: | (2 for benign, 4 for malignant) |\n\n# Output\n\nThis implementation outputs estimated edges and vertices from the data in files marked with optional arguments:\n`/data/breast-data-result-0.1-10.txt`\n\nCPDs are produced:\n`/data/breast-data-result-CPDs.txt`\n\nExample output is at the bottom.\n\n# Query\n\nOpen the python shell from the implementation folder and instantiate the class:\n`from learnthestructure import LearnTheStructure`\n`bn = LearnTheStructure()`\n\nWhat is the probability that a patient's cancer is malignant given that Bare Nuclei has a value of 10?\n\n`evidence = dict(BareNuclei=10)`\n\n`query = dict(Class=[4])`\n\n`bn.query_it(evidence,query)`\n\n`The probability of  {'Class': [4]}  given  {'BareNuclei': 10}  is  0.808146976884`\n\n\n```\nEdges:\n[\n  [\n    \"UniformityofCellShape\",\n    \"UniformityofCellSize\"\n  ],\n  [\n    \"UniformityofCellSize\",\n    \"Class\"\n  ],\n  [\n    \"BareNuclei\",\n    \"Class\"\n  ]\n]\n```\n```\nVertices data with CPD:\n{\n  \"SingleEpithelialCellSize\": {\n    \"vals\": [\n      2,\n      7,\n      3,\n      1,\n      6,\n      4,\n      5,\n      8,\n      10,\n      9\n    ],\n    \"numoutcomes\": 10,\n    \"cprob\": [\n      0.5522174535050072,\n      0.017167381974248927,\n      0.10300429184549356,\n      0.06723891273247497,\n      0.058655221745350504,\n      0.06866952789699571,\n      0.055793991416309016,\n      0.030042918454935622,\n      0.044349070100143065,\n      0.002861230329041488\n    ],\n    \"parents\": [],\n    \"children\": []\n  },\n  \"UniformityofCellSize\": {\n    \"vals\": [\n      1,\n      4,\n      8,\n      10,\n      2,\n      3,\n      7,\n      5,\n      6,\n      9\n    ],\n    \"numoutcomes\": 10,\n    \"cprob\": [\n      0.5493562231759657,\n      0.05722460658082976,\n      0.04148783977110158,\n      0.09585121602288985,\n      0.06437768240343347,\n      0.07439198855507868,\n      0.027181688125894134,\n      0.04291845493562232,\n      0.03862660944206009,\n      0.008583690987124463\n    ],\n    \"parents\": [],\n    \"children\": [\n      \"UniformityofCellShape\",\n      \"Class\"\n    ]\n  },\n  \"BareNuclei\": {\n    \"vals\": [\n      1,\n      10,\n      2,\n      4,\n      3,\n      9,\n      7,\n      5,\n      8,\n      6\n    ],\n    \"numoutcomes\": 10,\n    \"cprob\": [\n      0.597997138769671,\n      0.1888412017167382,\n      0.04291845493562232,\n      0.027181688125894134,\n      0.04005722460658083,\n      0.012875536480686695,\n      0.011444921316165951,\n      0.04291845493562232,\n      0.030042918454935622,\n      0.005722460658082976\n    ],\n    \"parents\": [],\n    \"children\": [\n      \"Class\"\n    ]\n  },\n  \"UniformityofCellShape\": {\n    \"vals\": [\n      1,\n      4,\n      8,\n      10,\n      2,\n      3,\n      5,\n      6,\n      7,\n      9\n    ],\n    \"numoutcomes\": 10,\n    \"cprob\": {\n      \"['1']\": [\n        0.8619791666666666,\n        0.013020833333333334,\n        0.0,\n        0.0,\n        0.06770833333333333,\n        0.057291666666666664,\n        0.0,\n        0.0,\n        0.0,\n        0.0\n      ],\n      \"['8']\": [\n        0.0,\n        0.06896551724137931,\n        0.3793103448275862,\n        0.06896551724137931,\n        0.0,\n        0.034482758620689655,\n        0.0,\n        0.06896551724137931,\n        0.27586206896551724,\n        0.10344827586206896\n      ],\n      \"['3']\": [\n        0.17307692307692307,\n        0.15384615384615385,\n        0.019230769230769232,\n        0.0,\n        0.21153846153846154,\n        0.25,\n        0.1346153846153846,\n        0.057692307692307696,\n        0.0,\n        0.0\n      ],\n      \"['6']\": [\n        0.0,\n        0.2222222222222222,\n        0.037037037037037035,\n        0.037037037037037035,\n        0.0,\n        0.07407407407407407,\n        0.14814814814814814,\n        0.3333333333333333,\n        0.1111111111111111,\n        0.037037037037037035\n      ],\n      \"['10']\": [\n        0.0,\n        0.029850746268656716,\n        0.08955223880597014,\n        0.7164179104477612,\n        0.014925373134328358,\n        0.029850746268656716,\n        0.029850746268656716,\n        0.029850746268656716,\n        0.04477611940298507,\n        0.014925373134328358\n      ],\n      \"['7']\": [\n        0.0,\n        0.10526315789473684,\n        0.2631578947368421,\n        0.15789473684210525,\n        0.0,\n        0.0,\n        0.05263157894736842,\n        0.05263157894736842,\n        0.3157894736842105,\n        0.05263157894736842\n      ], ...\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjustinshenk%2Flearnthestructure","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjustinshenk%2Flearnthestructure","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjustinshenk%2Flearnthestructure/lists"}