https://github.com/yinleicoder/linux-command-crawler
Linux命令行手册[爬虫]🤡
https://github.com/yinleicoder/linux-command-crawler
chrome linux mongodb python serveless wechat
Last synced: about 2 months ago
JSON representation
Linux命令行手册[爬虫]🤡
- Host: GitHub
- URL: https://github.com/yinleicoder/linux-command-crawler
- Owner: yinleiCoder
- License: apache-2.0
- Created: 2020-10-04T07:12:00.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2020-11-12T11:57:50.000Z (over 4 years ago)
- Last Synced: 2025-02-14T22:34:17.416Z (4 months ago)
- Topics: chrome, linux, mongodb, python, serveless, wechat
- Language: JavaScript
- Homepage:
- Size: 84 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Linux命令行手册[爬虫]:clown_face:
- 数据来源:https://www.linuxcool.com/
- 数据存储:mongodb:https://www.mongodb.com/cloud
- 后台:Servless服务[腾讯云服务]
- python: 爬虫(xpath语法)
- 前端:微信小程序
- 桌面端:Electron
### chrome爬取【正则】:
```javascript
let data = []
document.querySelectorAll('section ul a').forEach((item)=>{
let it = {"name":item.href.split('/')[item.href.split('/').length-1]}
data.push(it)
dealSub(item.href, it)
})function dealSub(url, it){
it.params=[];
fetch(url).then(res=>res.text()).then(txt=>{
it.usage=txt.match(/语法格式(.*<\/p>)/gm)[0].match(/<\/strong>(.*)<\/p>/)[1]
let tb = txt.match(/(.*)<\/tbody>/gm)[0]
let td = tb.match(/(((?!<\/?td>).)*)<\/td>/g)
for(let i=0;i>td.length;i=i+2){
let param = td[i].match(/(.*)<\/td>/)[1].trim()
let content = td[i+1].match(/(.*)<\/td>/)[1].trim()
it.params.push({param,content})
}
})
}//js下载文本
let dataStr = JSON.stringify(data)
let filename='data.json'
let a = document.createElement('a')
let blob = new Blob([dataStr])
a.download=filename
a.href=URL.createObjectURL(blob)
a.click()
URL.revokeObjectURL(blob)
```### 在线的Mongodb云环境:
https://cloud.mongodb.com/v2/5f842185268d972bc579cbe8#metrics/replicaSet/5f842367b9ff202e2033833f/explorer/test/linux-cmd/find
新建项目-》新建集群-》将最开始爬取下的数据保存到json文件里的数据拷贝并粘贴插入到在线Mongodb云环境的Document里即可。
准备好后,按照它的示例代码连接云mongodb环境。
## 腾讯云函数搭建后台服务:
Node环境[还可以使用其他语言环境],并将本地的代码上传到腾讯云函数上。代码见mongodb-server文件夹.
搭建成功访问:
1. https://service-osrdenkx-1301156794.cd.apigw.tencentcs.com/release/linux-cmd?type=byName&name=linux的命令
e.g: https://service-f57fq73b-1301156794.cd.apigw.tencentcs.com/release/linux-cmd?type=byName&name=ls
2. https://service-f57fq73b-1301156794.cd.apigw.tencentcs.com/release/linux-cmd?type=getAllName### 微信小程序:
这里写的微信小程序很简单,只是简单的调用了上面搭建好的腾讯云后端服务,不过需要注意的是,需要将自己的腾讯云函数域名添加到小程序服务器域名白名单。
微信小程序官网-》登录-》开发-》开发设置-》服务器域名-》添加域名到request合法域名。