{"id":13671031,"url":"https://github.com/SmileSmith/tiny-red-book","last_synced_at":"2025-04-27T13:33:29.967Z","repository":{"id":91222749,"uuid":"148957091","full_name":"SmileSmith/tiny-red-book","owner":"SmileSmith","description":"小红书数据抓取","archived":false,"fork":false,"pushed_at":"2019-02-18T10:05:16.000Z","size":38,"stargazers_count":171,"open_issues_count":3,"forks_count":37,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-07T18:52:51.663Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SmileSmith.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-09-16T02:41:34.000Z","updated_at":"2025-02-24T16:53:04.000Z","dependencies_parsed_at":"2023-03-28T23:37:59.561Z","dependency_job_id":null,"html_url":"https://github.com/SmileSmith/tiny-red-book","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SmileSmith%2Ftiny-red-book","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SmileSmith%2Ftiny-red-book/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SmileSmith%2Ftiny-red-book/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SmileSmith%2Ftiny-red-book/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SmileSmith","download_url":"https://codeload.github.com/SmileSmith/tiny-red-book/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251145847,"owners_count":21543108,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T09:00:56.370Z","updated_at":"2025-04-27T13:33:24.957Z","avatar_url":"https://github.com/SmileSmith.png","language":"JavaScript","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"readme":"# 小红书数据抓取\n\n## 整体思路\n\n1. 先抓取homeFeed的100条Feed作为基础数据\n\n2. 再抓取feed详情中的html，从html解析出topicId\n\n3. 再抓取Topic详情，抓取topic页html中的关联Topic，接口中的topic名称，点赞数、访问数等\n\n4. 循环2~3步骤\n\n## 基于 superagent 和 Puppeteer\n\n因为小红书详情页面的数据存在多重防爬机制，因此先简单用 puppeteer 模拟页面请求抓取，获取_at参数\n\n## 快速开始\n\n1.根目录执行\n\n```javascript\nnpm install\n```\n\n2.打开mogodb服务，并新建./config/db.js，参考如下：\n\n```javascript\nmodule.exports = {\n  user: 'tinyredbook', // mogodb用户名\n  pwd: 'xxxxxx', // mogodb密码\n  host: '127.0.0.1', // mogodb host地址，端口默认27017\n};\n```\n\n3.根目录执行\n\n```javascript\nnode index.js\n```\n\n\n## 小红书App SIGN 算法 Python 版本（备份，或许会用到）\n\n```python\n#coding:utf-8\nimport hashlib\n\n\ndef md5hex(word):\n    if isinstance(word, unicode):\n        word = word.encode(\"utf-8\")\n    elif not isinstance(word, str):\n        word = str(word)\n    m = hashlib.md5()\n    m.update(word)\n    return m.hexdigest()\n\n\n#参数名\nparamas_name=[\n    'android_id',\n    'channel',\n    'deviceId',\n    'device_fingerprint',\n    'imei',\n    'lang',\n    'password',\n    'phone',\n    'platform',\n    'sid',\n    'start',\n    't',\n    'type',\n    'versionName',\n    'zone'\n    ]\n\n\n#按参数名顺序传入参数值列表，无参数名留空值\ndef get_sign(paramas_value):\n    key=''\n    for index,item in enumerate(paramas_value):\n        if item!='':\n            key=key+paramas_name[index]+'%3D'+item\n    deviceId=paramas_value[2]\n    v1_2 = bytearray(key, 'utf-8')\n    v5_1 = ''\n    v3_2 = 0\n    v2 = 0\n    v4_1=bytearray(deviceId, 'utf-8')\n\n    while v2\u003clen(v1_2):\n        v5_1 = v5_1 + str(v1_2[v2] ^ v4_1[v3_2 ])\n        v3_2 = (v3_2 + 1) % len(v4_1)\n        v2 = v2 + 1\n\n    sign=md5hex(md5hex(v5_1)+deviceId)\n    return sign\n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSmileSmith%2Ftiny-red-book","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSmileSmith%2Ftiny-red-book","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSmileSmith%2Ftiny-red-book/lists"}