https://github.com/mmatongo/chew
Chew is a Go library for processing various content types into markdown/plaintext.
https://github.com/mmatongo/chew
go golang llm
Last synced: about 1 year ago
JSON representation
Chew is a Go library for processing various content types into markdown/plaintext.
- Host: GitHub
- URL: https://github.com/mmatongo/chew
- Owner: mmatongo
- License: mit
- Created: 2024-07-08T13:11:09.000Z (almost 2 years ago)
- Default Branch: v1
- Last Pushed: 2025-02-16T12:31:54.000Z (over 1 year ago)
- Last Synced: 2025-03-28T08:51:18.266Z (over 1 year ago)
- Topics: go, golang, llm
- Language: Go
- Homepage:
- Size: 1.52 MB
- Stars: 41
- Watchers: 3
- Forks: 2
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://goreportcard.com/report/github.com/mmatongo/chew)
[](https://pkg.go.dev/github.com/mmatongo/chew)
[](https://codeclimate.com/github/mmatongo/chew/maintainability)
[](https://codecov.io/github/mmatongo/chew)
[](./LICENSE)
>
A Go library for processing various content types into markdown/plaintext..
*Chew* is a Go library that processes various content types into markdown or plaintext. It supports multiple content types, including HTML, PDF, CSV, JSON, YAML, DOCX, PPTX, Markdown, Plaintext, MP3, FLAC, and WAVE.
```bash
go get github.com/mmatongo/chew
```
Here's a basic example of how to use Chew:
```go
package main
import (
"context"
"fmt"
"log"
"time"
"github.com/mmatongo/chew/v1"
)
func main() {
urls := []string{
"https://example.com",
}
config := chew.Config{
UserAgent: "Chew/1.0 (+https://github.com/mmatongo/chew)",
RetryLimit: 3,
RetryDelay: 5 * time.Second,
CrawlDelay: 10 * time.Second,
ProxyList: []string{}, // Add your proxies here, or leave empty
RateLimit: 2 * time.Second,
RateBurst: 3,
IgnoreRobotsTxt: false,
}
haChew := chew.New(config)
// The context is optional, but can be used to cancel the operation after a certain time
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
chunks, err := haChew.Process(ctx, urls)
if err != nil {
if err == context.DeadlineExceeded {
log.Println("Operation timed out")
} else {
log.Printf("Error processing URLs: %v", err)
}
return
}
for _, chunk := range chunks {
fmt.Printf("Source: %s\nContent: %s\n\n", chunk.Source, chunk.Content)
}
}
```
Output
```bash
Source: https://example.com
Content: Example Domain
Source: https://example.com
Content: This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.
Source: https://example.com
Content: More information...
```
You can find more examples in the [examples](./examples) directory as well as instructions on how to use Chew with Ruby and Python.
Contributions are welcome! Feel free to open an issue or submit a pull request if you have any suggestions or improvements.
This project is licensed under the MIT License - see the [LICENSE](./LICENSE) file for details.
The [logo](https://github.com/MariaLetta/free-gophers-pack) was made by the amazing [MariaLetta](https://github.com/MariaLetta).
### Similar Projects
[docconv](https://github.com/sajari/docconv)
### Roadmap
The roadmap for this project is available [here](./TODO.md). It's meant more as a guide than a strict plan because I only work on this project in my free time.