{"id":19993331,"url":"https://github.com/jasonTangxd/recommendSys","last_synced_at":"2025-05-04T12:31:16.913Z","repository":{"id":42409984,"uuid":"62069806","full_name":"jasonTangxd/recommendSys","owner":"jasonTangxd","description":"推荐项目（实时推荐和离线推荐）","archived":false,"fork":false,"pushed_at":"2017-10-24T12:09:29.000Z","size":2151,"stargazers_count":248,"open_issues_count":0,"forks_count":115,"subscribers_count":21,"default_branch":"master","last_synced_at":"2024-11-11T10:45:13.645Z","etag":null,"topics":["hadoop","kafka","mahot","storm","toos"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jasonTangxd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-06-27T16:10:03.000Z","updated_at":"2024-10-20T13:56:33.000Z","dependencies_parsed_at":"2022-09-01T01:02:08.480Z","dependency_job_id":null,"html_url":"https://github.com/jasonTangxd/recommendSys","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jasonTangxd%2FrecommendSys","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jasonTangxd%2FrecommendSys/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jasonTangxd%2FrecommendSys/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jasonTangxd%2FrecommendSys/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jasonTangxd","download_url":"https://codeload.github.com/jasonTangxd/recommendSys/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252334332,"owners_count":21731387,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hadoop","kafka","mahot","storm","toos"],"created_at":"2024-11-13T04:52:35.567Z","updated_at":"2025-05-04T12:31:11.905Z","avatar_url":"https://github.com/jasonTangxd.png","language":"Java","funding_links":[],"categories":["Java"],"sub_categories":[],"readme":"# recommendSys\n- 推荐系统\n- 离线计算和实时计算\n\n本项目主要分为WEB(产生数据)、离线和实时三大模块\n\n## WEB(产生数据即用户的行为数据）\n1. 用户对物品的操作（查看,浏览，购买）ugcLOG\n2. 通过flume采集ugcLOG日志到HDFS\n\n\n## 离线处理（hadoop+mahout）基于用户和物品的协同过滤\n1. 通过定时（oozie、crontab）任务(mr)，处理HDFS上面的ugcLOG\n2. 清理后的数据(用户id,itemID,评分)，给mahout\n3. mahout清理之后就是每个用户对应的item物品列表\n4. 清洗后的结果数据，然后通过sqoop导入到数据库mysql中或者放入到hive中（web展现或者交给数据分析人员）\n\n1. 当天的数据：当日凌晨截至到统计时间点的数据\n2. 之前的历史数据：截至到今天凌晨的历史数据\n\n\n## 实时处理（kafka+stome）基于用户和物品标签\n1. 收集：收集用户的特征向量（用户和标签的矩阵），（userID tag1 tag2）\n2. 收集：收集物品的特征向量（物品和标签的矩阵），（itemID tag1 tag2 tag5）\n3. 计算：然后通过1，2计算出用户和物品的特征值（矩阵乘积）\n4. 过滤：通过userID item列表过滤掉已经产生行为的物品/通过运营决策过滤/用户自定义过滤\n5. 排序：topN(包括自定义权重，比如想在周末推销某个产品等)\n\n1. 通过web收集特征行为数据（用户标签，评论数据）\n2. 把收集的数据实时传入kafka\n3. 特征行为数据和用户属性特征数据（数据库）共同组装成用户特征向量\n4. 用户特征向量和物品的特征矩阵（用户和系统打的标签，权重等）计算出矩阵乘积\n5. 过滤，计算topN\n\n\n\n\n\n# 博客地址\n[小小默：http://blog.xiaoxiaomo.com](http://blog.xiaoxiaomo.com)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FjasonTangxd%2FrecommendSys","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FjasonTangxd%2FrecommendSys","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FjasonTangxd%2FrecommendSys/lists"}