Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sumingcheng/go-crawler
提取知乎文章信息
https://github.com/sumingcheng/go-crawler
cookies go playwright zhihu
Last synced: 20 days ago
JSON representation
提取知乎文章信息
- Host: GitHub
- URL: https://github.com/sumingcheng/go-crawler
- Owner: sumingcheng
- License: apache-2.0
- Created: 2024-08-27T07:23:53.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-12-12T09:02:24.000Z (about 2 months ago)
- Last Synced: 2024-12-12T09:35:20.143Z (about 2 months ago)
- Topics: cookies, go, playwright, zhihu
- Language: Go
- Homepage:
- Size: 123 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# zhihu-crawler
## 项目目的以前在知乎写文章,现在不想在知乎写了,都爬下来以后自己写博客了。所有就有了这个项目。
## 操作步骤
使用`cookieEdit`复制导出的cookie放入根目录下的`zhihu.json`文件内,没有就创建一个`zhihu.json`
然后运行项目
```
go run /cmd/mian.go
```接口触发爬取动作
```
curl --location --request POST 'http://127.0.0.1:12345/api/crawler/zhihu'
```项目依赖MySQL,爬取后的内容会存下来。你可以直接在表中导出
![image-20241212165806131](D:\Desktop\GitHub\go-crawler\assets\image-20241212165806131.png)
## 导出文章
使用无头浏览器爬取知乎文章信息,然后使用https://github.com/chenluda/zhihu-download下载文章内容,具体可以看这个项目
你也可以直接使用根目录的`zhihu-download`做了些小优化,日志和下载方面能方便些