Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pysrc/bs
Use go to parse html, and just like BeautifulSoup
https://github.com/pysrc/bs
beautifulsoup go html parse
Last synced: about 1 month ago
JSON representation
Use go to parse html, and just like BeautifulSoup
- Host: GitHub
- URL: https://github.com/pysrc/bs
- Owner: pysrc
- License: mit
- Created: 2017-12-19T06:57:43.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2018-09-08T16:08:30.000Z (over 6 years ago)
- Last Synced: 2024-06-19T16:33:27.469Z (7 months ago)
- Topics: beautifulsoup, go, html, parse
- Language: Go
- Homepage:
- Size: 9.77 KB
- Stars: 9
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- License: LICENSE
Awesome Lists containing this project
README
# A Simple HTML-Parser
**安装:** `go get github.com/pysrc/bs`
## 快速开始
```go
package mainimport (
"fmt"
"github.com/pysrc/bs"
)var html_doc = `
The Dormouse's storyThe Dormouse's story
Once upon a time there were three little sisters; and their names were
Elsie,
Lacie and
Tillie;
and they lived at the bottom of a well....
`func main() {
soup := bs.Init(html_doc)// 找出所有 a 标签的链接
for i, j := range soup.SelByTag("a") {
fmt.Println(i, (*j.Attrs)["href"])
}
/*Output:
0 http://example.com/elsie
1 http://example.com/lacie
2 http://example.com/tillie
*/// 获取属性 class="story" 的 p 标签
for i, j := range soup.Sel("p", &map[string]string{"class": "story"}) {
fmt.Println(i, "Tag", j.Tag)
// 找出子标签为 a 的标签
for k, v := range j.SelByTag("a") {
fmt.Println(k, "son", v.Value)
}
}
/*Output:
0 Tag p
0 son Elsie
1 son Lacie
2 son Tillie
1 Tag p
*/for _, j := range soup.SelById("lin.*") { // 使用正则匹配
fmt.Println("regex", j.Tag, j.Value)
}
/*Output:
regex a Elsie
regex a Lacie
regex a Tillie
*/
// 从url直接解析
soup = bs.Init("https://github.com/")
for _, j := range soup.Sel("title", nil) {
fmt.Println("title:", j.Value)
}
/*Output:
title: The world’s leading software development platform · GitHub
title: 1clr-code-hosting
*/
}```
## 可用接口
### 1.方法
```go
type SelFunc interface {
Sel(tag string, attrs *map[string]string) (nodes []*Node)
SelById(id string) []*Node
SelByTag(tag string) []*Node
SelByClass(class string) []*Node
SelByName(name string) []*Node
}
```### 2.可操作属性
```go
type Node struct { // 基本节点结构
Tag string // 标签名
Attrs *map[string]string //属性
Value string // 此节点的值
Sons []*Node // 子节点
}
```