{"id":20017381,"url":"https://github.com/gaussic/weibo_wordcloud","last_synced_at":"2025-06-27T10:08:20.002Z","repository":{"id":49371008,"uuid":"90234059","full_name":"gaussic/weibo_wordcloud","owner":"gaussic","description":"根据关键词抓取微博数据，再生成词云","archived":false,"fork":false,"pushed_at":"2018-01-25T05:26:35.000Z","size":1336,"stargazers_count":221,"open_issues_count":2,"forks_count":72,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-07T11:01:33.625Z","etag":null,"topics":["crawler","keyword","search","weibo","wordcloud"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gaussic.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-05-04T07:26:55.000Z","updated_at":"2025-03-13T09:30:38.000Z","dependencies_parsed_at":"2022-08-12T20:10:58.344Z","dependency_job_id":null,"html_url":"https://github.com/gaussic/weibo_wordcloud","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gaussic/weibo_wordcloud","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gaussic%2Fweibo_wordcloud","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gaussic%2Fweibo_wordcloud/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gaussic%2Fweibo_wordcloud/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gaussic%2Fweibo_wordcloud/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gaussic","download_url":"https://codeload.github.com/gaussic/weibo_wordcloud/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gaussic%2Fweibo_wordcloud/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262235783,"owners_count":23279567,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","keyword","search","weibo","wordcloud"],"created_at":"2024-11-13T08:15:44.856Z","updated_at":"2025-06-27T10:08:19.981Z","avatar_url":"https://github.com/gaussic.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 微博爬虫与词云展示\n\n### 环境\n\n- Python 3\n- requests\n- jieba\n- matplotlib\n- wordcloud\n- scipy\n\n### 爬虫\n\n由于移动端网页版并未对爬虫做太大的限制，因此可以直接爬取微博搜索部分数据。搜索 API 如下：\n\n```\nhttps://m.weibo.cn/api/container/getIndex?type=wb\u0026queryVal={}\u0026containerid=100103type=2%26q%3D{}\u0026page={}\n```\n\n基于这个 API 可以获取到一定量的 JSON 数据 （原始数据见 sample.json），经过处理后，格式如下:\n\n```json\n{\n    \"mid\": \"4199434918992223\",\n    \"text\": \"【深度学习的终极形态】近期，院友袁进辉博士回到微软亚洲研究院做了题为《打造最强深度学习引擎》的报告，分享了深度学习框架方面的技术进展。他在报告中启发大家思考如何才能“鱼和熊掌兼得”，让软件发挥灵活性，硬件发挥高效率。我们整理了本次报告的重点，希望能对大家有所帮助！  ​...全文\",\n    \"userid\": \"1286528122\",\n    \"username\": \"微软亚洲研究院\",\n    \"reposts_count\": 21,\n    \"comments_count\": 1,\n    \"attitudes_count\": 9\n}\n```\n\n详细的爬虫见 weibo_search.py。\n\n### 词云\n\n词云的实现可以使用 wordcloud，基本的步骤是：\n\n1. 分词与关键词提取：中文的文本需要分词和去除大量的停用词，例如（你，我，他，这是），\n才能使得生成的词云图更加具有意义。这一步，使用 jieba 分词器的 TF-IDF 关键词提取，就可以直接完成。\n\n2. 传入 wordcloud 的是一个字符串以及一幅底层图像，将第一步得到的关键词用空格串联起来，\n对于底层图像的选取，尽量选择白底无背景图像，这样生成的图像就会更加接近原图。\n\n代码详见 weibo_cloud.py。\n\n\n\n### 样例\n\n关键词：iPhone\n\n![apple](apple_wc.png)\n\n\n关键词：微软\n\n![microsoft](edge_wc.png)\n\n关键词：谷歌\n\n![google](google_wc.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgaussic%2Fweibo_wordcloud","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgaussic%2Fweibo_wordcloud","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgaussic%2Fweibo_wordcloud/lists"}