{"id":27771446,"url":"https://github.com/wuba/lpa-detector","last_synced_at":"2025-04-29T22:39:57.060Z","repository":{"id":37131208,"uuid":"249886557","full_name":"wuba/LPA-Detector","owner":"wuba","description":"Optimize and improve the Label propagation algorithm","archived":false,"fork":false,"pushed_at":"2022-06-17T03:02:08.000Z","size":561,"stargazers_count":89,"open_issues_count":3,"forks_count":22,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-29T22:39:44.258Z","etag":null,"topics":["graph-computation","risk-propagation","spark"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wuba.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-03-25T04:26:39.000Z","updated_at":"2024-10-21T09:28:06.000Z","dependencies_parsed_at":"2022-06-24T06:51:36.343Z","dependency_job_id":null,"html_url":"https://github.com/wuba/LPA-Detector","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wuba%2FLPA-Detector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wuba%2FLPA-Detector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wuba%2FLPA-Detector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wuba%2FLPA-Detector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wuba","download_url":"https://codeload.github.com/wuba/LPA-Detector/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251596619,"owners_count":21615012,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["graph-computation","risk-propagation","spark"],"created_at":"2025-04-29T22:39:56.525Z","updated_at":"2025-04-29T22:39:57.037Z","avatar_url":"https://github.com/wuba.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# 基于Graphx 的 LPA 算法改进\nLPA 是一种基于标签传播的局部社团划分算法，而GraphX 是一种优秀的并行的图计算框架。\n将两者结合在一起能够大大加快整个算法的迭代效率，并且对算法的迭代我们进行了一定的优化，\n能够对不同的标签和不同的关系设定置信权重，更好的优化我们的迭代结果，\n帮助我们快速迭代算法和进行风险传导。\n[详细介绍文章](https://mp.weixin.qq.com/s/JcvPa2EZvk_N4iRD2MQmwg)\n\n## 应用介绍\n### 参考文档\nLPA 的相关论文参考 [Paper文件](https://github.com/wuba/LPA-Detector/blob/master/paper/LPA.pdf)。\n\nGraphx 的详细说明文档：[GraphX说明文档](http://spark.apache.org/)\n\n### 现状\n1. LPA 主要实现是基于Python 或者 R 实现，并且是单机运行，在大规模的数据下，无法进行迭代。\n2. Graphx 自带的LPA 算法，无法自定义的label标签，并且无法设置标签和边的置信权重，无法优化迭代效果。\n\n \n## 应用介绍\n1. 将已知的用户节点标签通过关系传播到无标签的用户结点上，传播结果以评分的形式使用。例如，在风控场景中通过风险传导的方式，将”好坏”用户的标签传播给周围的邻居（一度 ，二度.....）节点。\n2. 可调节的用户结点置信权重和边权值,例如，风控场景中，坏用户对周边节点的影响力大于好用户，即：坏用户的置信权重高于好用户。共同使用手机号或者身份证号的边关系大于共同使用的IP的边关系。\n3. 优化标签传播算法的迭代效率。优化了节点间的消息传递机制优，减少不必要网络开销，提高了迭代效率。\n\n## 运行环境\n1. Spark 2.3.1 +\n2. JDK 8+\n3. Graphx 2.3.1+\n4. scala 2.11\n\n**默认情况下，在Spark集群运行时满足以上的条件的，并且我们需要构造Graph 用于迭代。**\n\n## 使用方式\n### 图的结构\n输入的图的RDD\n1. edge：Edge\\\u003cMap\\\u003cString,String\\\u003e\\\u003e\n2. vertices:Tuple2\\\u003cLong,Map\\\u003cString,String\\\u003e\\\u003e\n \n### 参数说明\nLabelPropagationOps 类主要参数如下：\n1. graph:输入的图，结构如上。\n2. maxInterator : 设定的最大迭代次数。\n3. directEnums：消息传播的方向\n4. weight：权重Map，如 ：pass=0.3 is_parent=0.9\n5. labelKey：需要进行传播标签的key名称 ，如：label=pass，label=refuse。那么这个labelKey = label\n6. type：关系权重和顶点权值的类型的key ，如：V(type=user，type=phone .....)，edge(type=is_idCardNo....)\n\n### 运行\n```\n  LabelPropagationOps ops = new LabelPropagationOps();\n  ops.run;\n```\n### 结果说明\n结果是一个图，属性结构和输入的属性结构一致。顶点包含了一个key：propagateLabel ，是传播结果的标签。\n\n\n## 迭代图形结构说明\n1. 如果输入的图示有向图，并且方向对迭代效果影响较大，需要设定directEnums 参数\n2. 如果是无向图或者说方向不是特别重要，可以设定EdgeDirection.Either。\n\n## 注意事项\nLPA是属于非收敛的算法，我们需要设定最大迭代次数，一般情况是我们考虑一个节点对几度内关系节点有所影响。\n\n\n## 性能度量\n生产环境数据量：1亿+ 的顶点  和 40亿+ 的边基础上。标签样本：3千万+ \n\n环境配制：200 core ，单机20G 内存\n\n迭代时间如图：\n\n\n每轮迭代的时间大概在35分钟左右，随着迭代次数的增加，迭代时间有所增加。\n\n\n以上数据量在Python 或者 R 实现的单机环境下无法进行有效迭代。\n\n## Copyright and License\n\n\n\n\n\n\n\n\n\n\n\n \n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwuba%2Flpa-detector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwuba%2Flpa-detector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwuba%2Flpa-detector/lists"}