https://github.com/SmileSmith/tiny-red-book

小红书数据抓取
https://github.com/SmileSmith/tiny-red-book

Last synced: about 1 year ago
JSON representation

小红书数据抓取

Host: GitHub
URL: https://github.com/SmileSmith/tiny-red-book
Owner: SmileSmith
License: mit
Created: 2018-09-16T02:41:34.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2019-02-18T10:05:16.000Z (over 7 years ago)
Last Synced: 2025-04-07T18:52:51.663Z (about 1 year ago)
Language: JavaScript
Size: 37.1 KB
Stars: 171
Watchers: 7
Forks: 37
Open Issues: 3
Metadata Files:
- Readme: readme.md
- License: LICENSE

Awesome Lists containing this project

awesome-github-star - tiny-red-book

README

          # 小红书数据抓取

## 整体思路

1. 先抓取homeFeed的100条Feed作为基础数据

2. 再抓取feed详情中的html，从html解析出topicId

3. 再抓取Topic详情，抓取topic页html中的关联Topic，接口中的topic名称，点赞数、访问数等

4. 循环2~3步骤

## 基于 superagent 和 Puppeteer

因为小红书详情页面的数据存在多重防爬机制，因此先简单用 puppeteer 模拟页面请求抓取，获取_at参数

## 快速开始

1.根目录执行

```javascript

npm install

```

2.打开mogodb服务，并新建./config/db.js，参考如下：

```javascript

module.exports = {

  user: 'tinyredbook', // mogodb用户名

  pwd: 'xxxxxx', // mogodb密码

  host: '127.0.0.1', // mogodb host地址，端口默认27017

};

```

3.根目录执行

```javascript

node index.js

```

## 小红书App SIGN 算法 Python 版本（备份，或许会用到）

```python

#coding:utf-8

import hashlib

def md5hex(word):

    if isinstance(word, unicode):

        word = word.encode("utf-8")

    elif not isinstance(word, str):

        word = str(word)

    m = hashlib.md5()

    m.update(word)

    return m.hexdigest()

#参数名

paramas_name=[

    'android_id',

    'channel',

    'deviceId',

    'device_fingerprint',

    'imei',

    'lang',

    'password',

    'phone',

    'platform',

    'sid',

    'start',

    't',

    'type',

    'versionName',

    'zone'

    ]

#按参数名顺序传入参数值列表，无参数名留空值

def get_sign(paramas_value):

    key=''

    for index,item in enumerate(paramas_value):

        if item!='':

            key=key+paramas_name[index]+'%3D'+item

    deviceId=paramas_value[2]

    v1_2 = bytearray(key, 'utf-8')

    v5_1 = ''

    v3_2 = 0

    v2 = 0

    v4_1=bytearray(deviceId, 'utf-8')

    while v2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/SmileSmith/tiny-red-book

Awesome Lists containing this project

README