Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/andycall/page-hook-generator
auto get page selector of page title, content and publishTime
https://github.com/andycall/page-hook-generator
Last synced: 6 days ago
JSON representation
auto get page selector of page title, content and publishTime
- Host: GitHub
- URL: https://github.com/andycall/page-hook-generator
- Owner: andycall
- Created: 2016-12-06T07:25:45.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2017-04-01T05:46:38.000Z (over 7 years ago)
- Last Synced: 2024-10-19T04:30:40.615Z (2 months ago)
- Language: JavaScript
- Size: 1.91 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# page-hook-generator
通用解析算法,可识别任意新闻站点的页面结构,识别出标题和内容区域块## How it work
此插件仅支持分析新闻类网站。新闻类网站的生成通常都是通过后台的富文本编辑器生成
但是不同的富文本编辑器之间都有一个相同的特性 ---- 生成的HTML结构平整,嵌套结构层级简单,同时一个新闻最重要的是: 文字多。## try by yourself
[page-hook-generator-plugin](./demo/page-hook-generator.crx)
去百度feed上随便找个新闻 https://www.baidu.com/
点下这个按钮 ![button](./demo/icon.png)
然后就能得到结果:
![screenshot](./demo/screenshot.png)# How to test
```bash
npm run test
```# How to install
```
npm install page-hook-generator
```# How to use
```javascript
var generator = require('page-hook-generator');
var request = require('request');request({
url: 'http://tech.hexun.com/2016-12-06/187208491.html',
method: 'GET'
}, function(err, response, str) {
if (!err && response.statusCode === 200) {
let info = generator(str);
console.log('the title selector is', info.title);
console.log('the content selector is', info.content);
console.log('the page created time is', info.date);
}
})```