{"id":19215046,"url":"https://github.com/chgl16/data-mining-algorithm","last_synced_at":"2025-05-12T23:22:25.492Z","repository":{"id":107504213,"uuid":"192140732","full_name":"chgl16/data-mining-algorithm","owner":"chgl16","description":":bar_chart:  数据挖掘常用算法：关联分析Apriori算法，数据分类决策树算法，数据聚类K-means算法","archived":false,"fork":false,"pushed_at":"2019-06-16T03:47:34.000Z","size":9,"stargazers_count":24,"open_issues_count":0,"forks_count":7,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-04-20T19:37:23.695Z","etag":null,"topics":["apriori-algorithm","correlation-analysis","data-classification","data-mining-algorithms","k-means-clustering"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chgl16.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-06-16T02:07:01.000Z","updated_at":"2024-12-23T11:47:22.000Z","dependencies_parsed_at":"2023-05-17T15:15:17.883Z","dependency_job_id":null,"html_url":"https://github.com/chgl16/data-mining-algorithm","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chgl16%2Fdata-mining-algorithm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chgl16%2Fdata-mining-algorithm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chgl16%2Fdata-mining-algorithm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chgl16%2Fdata-mining-algorithm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chgl16","download_url":"https://codeload.github.com/chgl16/data-mining-algorithm/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253838100,"owners_count":21972093,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apriori-algorithm","correlation-analysis","data-classification","data-mining-algorithms","k-means-clustering"],"created_at":"2024-11-09T14:12:34.695Z","updated_at":"2025-05-12T23:22:25.485Z","avatar_url":"https://github.com/chgl16.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 数据挖掘算法\n1. [关联分析Apriori算法](#关联分析Apriori算法)  \n2. [数据分类决策树算法](#数据分类决策树算法)\n3. [数据聚类K-means算法](#数据聚类K-means算法)\n  \n\n\u003chr\u003e\n\n## 关联分析Apriori算法\n#### 1. [数据集](关联分析（Apriori）/data.txt)  \n以超市交易为数据集，所有商品的项集为    \n```bash    \nI = {bread, beer, cake, cream, milk, tea}\n```\n某条交易如  \n```bash\nTi = {bread, beer, milk}\n```\n简化为  \n```bash\nTi = {a, b, d}\n```\ndata.txt数据集样本如下\n```bash\na, d, e,f\na, d, e\nc, e\ne, f\n...\n```\n\n#### 2. [算法实现](关联分析（Apriori）/correlation_analysis.py)\n使用经典的Apriori算法，依次扫描交易记录集，计算出 *k-候选集Ck* 然后去除**支持度sup**小的项集获得 *k-频繁集Lk*， 只计算到 *3-频繁集* ，最后计算管理规则可信度即可。\n\u003e 第k个候选集只会从k-1频繁集中的各项目组合连接，然后扫描记录集，以获取Ck中各项集的支持度。       \n\n#### 3.输出\n\u003ccenter\u003e\n\u003cimg alt=\"算法输出\" src=\"https://i.loli.net/2019/06/16/5d05ad0e8f2e762317.png\" width=\"80%\" /\u003e \n\u003c/center\u003e\n\n\n\u003chr\u003e\n\n## 数据分类决策树算法\n#### 1. [数据集](数据分类（决策树）/data.txt)\n使用身高体重指数分为胖瘦两个分类，数据自己生成见 [*data_generation.py](数据分类（决策树）/data_generation.py) 比较简陋。  \n数据集样本如下\n```bash\n184 77 fat\n189 81 fat\n178 75 fat\n...\n```\n\n#### 2.[算法实现](数据分类（决策树）/decision_tree.py)\n调用python实现的类库，比较简单\n```python\nfrom sklearn import tree\nfrom sklearn.metrics import precision_recall_curve\nfrom sklearn.metrics import classification_report\nfrom sklearn.model_selection import train_test_split\n\n...\n\n# 数据拆分，80%训练，20%测试\nx_train, x_test, y_train, y_test=train_test_split(x, y, test_size = 0.2,random_state=0)\n\n# 使用DecisionTreeClassifier建立模型并训练\nclf = tree.DecisionTreeClassifier(criterion='entropy')\nclf.fit(x_train, y_train)\n\n...\n```\n打印后同时保持决策树到文件 [*tree.dot](数据分类（决策树）/tree.dot)，通过dot命令可以生产决策树图形（或者[在线转换](http://www.webgraphviz.com/)\n```python\n# 保存决策树为dot文件，后续图形处理\nwith open(\"tree.dot\", 'w') as f:\n    f = tree.export_graphviz(clf, out_file=f)\n```\n#### 3.输出  \n\u003ccenter\u003e\n\u003cimg alt=\"算法输出\" src=\"https://i.loli.net/2019/06/16/5d05b41f3cca371767.png\" width=\"80%\" /\u003e \n\u003c/center\u003e\n\n\u003ccenter\u003e\n\u003cspan\u003e决策树\u003c/span\u003e  \n\u003cbr\u003e\n\u003cimg alt=\"决策树\" src=\"https://i.loli.net/2019/06/16/5d05b41f6850332395.png\" width=\"80%\" /\u003e\n\u003c/center\u003e\n\n\u003chr\u003e\n\n## 数据聚类K-means算法\n#### 1. 数据集\n数据集采用python类库有名的iris坐标点集\n```python\nfrom sklearn import datasets\n\niris = datasets.load_iris()\nX, y = iris.data, iris.target\n```\n数据集样本如下\n```bash\n[1.5 0.2]\n[3.2 0.2]\n[3.1 0.2]\n[4.6 0.2]\n...\n```\n\n#### 2. [算法实现](数据聚类（K-means）/k-means.py)\nK-means算法需要先指定要分成k类，数据样本只有熟悉，没有类别。  \n大概步骤：  \n1. 从数据集X从随机选取k个数据样本作为聚类的初始化代表点，每一个代表点表示一个类别。\n2. 对于数据集中的任一样本点，都计算它与这k个初始化代表点的距离(d可用欧氏距离)，然后划分到距离最近的分类中去。完成一次聚类\n3. 划分好数据后，计算每个聚类的均值，并将之作为该聚类的新代表点，因此得到k个新代表点。\n4. 和第二步一样，再继续计算每个点到代表点的距离，划分到距离最小的类\n5. 重复3和4，直到各个聚类不再发生变化（样本点划分固定了），即误差平方和准则函数的值达到最优。\n  \n#### 3.输出\n\u003ccenter\u003e\n\u003cimg alt=\"决策树\" src=\"https://i.loli.net/2019/06/16/5d05bb1a54a9561636.png\" width=\"80%\" /\u003e\n\u003c/center\u003e\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchgl16%2Fdata-mining-algorithm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchgl16%2Fdata-mining-algorithm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchgl16%2Fdata-mining-algorithm/lists"}