Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mmatongo/chew
A Go library for processing various content types into markdown.
https://github.com/mmatongo/chew
Last synced: about 2 months ago
JSON representation
A Go library for processing various content types into markdown.
- Host: GitHub
- URL: https://github.com/mmatongo/chew
- Owner: mmatongo
- License: mit
- Created: 2024-07-08T13:11:09.000Z (6 months ago)
- Default Branch: master
- Last Pushed: 2024-07-09T07:21:58.000Z (6 months ago)
- Last Synced: 2024-07-09T08:43:22.193Z (6 months ago)
- Language: Go
- Size: 12.7 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[![Go Report Card](https://goreportcard.com/badge/github.com/mmatongo/chew)](https://goreportcard.com/report/github.com/mmatongo/chew)
[![GoDoc](https://godoc.org/github.com/mmatongo/chew?status.svg)](https://pkg.go.dev/github.com/mmatongo/chew)
[![Maintainability](https://api.codeclimate.com/v1/badges/441cfd36f310c0c48878/maintainability)](https://codeclimate.com/github/mmatongo/chew/maintainability)
[![codecov](https://codecov.io/github/mmatongo/chew/graph/badge.svg?token=6OOK91QQRC)](https://codecov.io/github/mmatongo/chew)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](./LICENSE)>
A Go library for processing various content types into markdown/plaintext..
*Chew* is a Go library that processes various content types into markdown or plaintext. It supports multiple content types, including HTML, PDF, CSV, JSON, YAML, DOCX, PPTX, Markdown, Plaintext, MP3, FLAC, and WAVE.
```bash
go get github.com/mmatongo/chew
```Here's a basic example of how to use Chew:
```go
package mainimport (
"context"
"fmt"
"log"
"time""github.com/mmatongo/chew/v1"
)func main() {
urls := []string{
"https://example.com",
}config := chew.Config{
UserAgent: "Chew/1.0 (+https://github.com/mmatongo/chew)",
RetryLimit: 3,
RetryDelay: 5 * time.Second,
CrawlDelay: 10 * time.Second,
ProxyList: []string{}, // Add your proxies here, or leave empty
RateLimit: 2 * time.Second,
RateBurst: 3,
IgnoreRobotsTxt: false,
}haChew := chew.New(config)
// The context is optional, but can be used to cancel the operation after a certain time
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()chunks, err := haChew.Process(ctx, urls)
if err != nil {
if err == context.DeadlineExceeded {
log.Println("Operation timed out")
} else {
log.Printf("Error processing URLs: %v", err)
}
return
}for _, chunk := range chunks {
fmt.Printf("Source: %s\nContent: %s\n\n", chunk.Source, chunk.Content)
}
}
```Output
```bash
Source: https://example.com
Content: Example DomainSource: https://example.com
Content: This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.Source: https://example.com
Content: More information...
```You can find more examples in the [examples](./examples) directory as well as instructions on how to use Chew with Ruby and Python.
Contributions are welcome! Feel free to open an issue or submit a pull request if you have any suggestions or improvements.
This project is licensed under the MIT License - see the [LICENSE](./LICENSE) file for details.
The [logo](https://github.com/MariaLetta/free-gophers-pack) was made by the amazing [MariaLetta](https://github.com/MariaLetta).
### Similar Projects
[docconv](https://github.com/sajari/docconv)### Roadmap
The roadmap for this project is available [here](./TODO.md). It's meant more as a guide than a strict plan because I only work on this project in my free time.