{"id":13582630,"url":"https://github.com/go-shiori/go-readability","last_synced_at":"2025-05-14T08:09:40.275Z","repository":{"id":39120449,"uuid":"118311747","full_name":"go-shiori/go-readability","owner":"go-shiori","description":"Go package that cleans a HTML page for better readability.","archived":false,"fork":false,"pushed_at":"2025-04-01T08:39:14.000Z","size":5055,"stargazers_count":797,"open_issues_count":15,"forks_count":90,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-04-14T22:05:43.137Z","etag":null,"topics":["go","golang","readability"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/go-shiori.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-01-21T06:49:12.000Z","updated_at":"2025-04-14T10:56:57.000Z","dependencies_parsed_at":"2022-08-01T07:58:59.382Z","dependency_job_id":"673c7b16-24f5-4dc0-b58e-397ab604440c","html_url":"https://github.com/go-shiori/go-readability","commit_stats":{"total_commits":178,"total_committers":14,"mean_commits":"12.714285714285714","dds":0.1235955056179775,"last_synced_commit":"92284fa8a71f48f6b8321e41898e12480ec0b7e1"},"previous_names":["radhifadlillah/go-readability"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/go-shiori%2Fgo-readability","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/go-shiori%2Fgo-readability/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/go-shiori%2Fgo-readability/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/go-shiori%2Fgo-readability/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/go-shiori","download_url":"https://codeload.github.com/go-shiori/go-readability/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254101558,"owners_count":22014908,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["go","golang","readability"],"created_at":"2024-08-01T15:02:53.838Z","updated_at":"2025-05-14T08:09:35.266Z","avatar_url":"https://github.com/go-shiori.png","language":"HTML","funding_links":["https://www.paypal.me/RadhiFadlillah","https://ko-fi.com/radhifadlillah"],"categories":["开源类库","HTML","Open source library"],"sub_categories":["文本处理","Word Processing"],"readme":"# Go-Readability [![Go Reference][go-ref-badge]][go-ref] [![PayPal][paypal-badge]][paypal] [![Ko-fi][kofi-badge]][kofi]\n\nGo-Readability is a Go package that find the main readable content and the metadata from a HTML page. It works by removing clutter like buttons, ads, background images, script, etc.\n\nThis package is based from [Readability.js] by [Mozilla] and written line by line to make sure it looks and works as similar as possible. This way, hopefully all web page that can be parsed by Readability.js are parse-able by go-readability as well.\n\n## Table of Contents\n\n- [Table of Contents](#table-of-contents)\n- [Status](#status)\n- [Installation](#installation)\n- [Example](#example)\n- [Command Line Usage](#command-line-usage)\n- [Licenses](#licenses)\n\n## Status\n\nThis package is stable enough for use and up to date with Readability.js [v0.4.4][last-version] (commit [`b359811`][last-commit]).\n\n## Installation\n\nTo install this package, just run `go get` :\n\n```\ngo get -u -v github.com/go-shiori/go-readability\n```\n\n## Example\n\nTo get the readable content from an URL, you can use `readability.FromURL`. It will fetch the web page from specified url, check if it's readable, then parses the response to find the readable content :\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"log\"\n\t\"os\"\n\t\"time\"\n\n\treadability \"github.com/go-shiori/go-readability\"\n)\n\nvar (\n\turls = []string{\n\t\t// this one is article, so it's parse-able\n\t\t\"https://www.nytimes.com/2019/02/20/climate/climate-national-security-threat.html\",\n\t\t// while this one is not an article, so readability will fail to parse.\n\t\t\"https://www.nytimes.com/\",\n\t}\n)\n\nfunc main() {\n\tfor i, url := range urls {\n\t\tarticle, err := readability.FromURL(url, 30*time.Second)\n\t\tif err != nil {\n\t\t\tlog.Fatalf(\"failed to parse %s, %v\\n\", url, err)\n\t\t}\n\n\t\tdstTxtFile, _ := os.Create(fmt.Sprintf(\"text-%02d.txt\", i+1))\n\t\tdefer dstTxtFile.Close()\n\t\tdstTxtFile.WriteString(article.TextContent)\n\n\t\tdstHTMLFile, _ := os.Create(fmt.Sprintf(\"html-%02d.html\", i+1))\n\t\tdefer dstHTMLFile.Close()\n\t\tdstHTMLFile.WriteString(article.Content)\n\n\t\tfmt.Printf(\"URL     : %s\\n\", url)\n\t\tfmt.Printf(\"Title   : %s\\n\", article.Title)\n\t\tfmt.Printf(\"Author  : %s\\n\", article.Byline)\n\t\tfmt.Printf(\"Length  : %d\\n\", article.Length)\n\t\tfmt.Printf(\"Excerpt : %s\\n\", article.Excerpt)\n\t\tfmt.Printf(\"SiteName: %s\\n\", article.SiteName)\n\t\tfmt.Printf(\"Image   : %s\\n\", article.Image)\n\t\tfmt.Printf(\"Favicon : %s\\n\", article.Favicon)\n\t\tfmt.Printf(\"Text content saved to \\\"text-%02d.txt\\\"\\n\", i+1)\n\t\tfmt.Printf(\"HTML content saved to \\\"html-%02d.html\\\"\\n\", i+1)\n\t\tfmt.Println()\n\t}\n}\n```\n\nHowever, sometimes you want to parse an URL no matter if it's an article or not. For example is when you only want to get metadata of the page. To do that, you have to download the page manually using `http.Get`, then parse it using `readability.FromReader` :\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"log\"\n\t\"net/http\"\n\t\"net/url\"\n\n\treadability \"github.com/go-shiori/go-readability\"\n)\n\nvar (\n\turls = []string{\n\t\t// Both will be parse-able now\n\t\t\"https://www.nytimes.com/2019/02/20/climate/climate-national-security-threat.html\",\n\t\t// But this one will not have any content\n\t\t\"https://www.nytimes.com/\",\n\t}\n)\n\nfunc main() {\n\tfor _, u := range urls {\n\t\tresp, err := http.Get(u)\n\t\tif err != nil {\n\t\t\tlog.Fatalf(\"failed to download %s: %v\\n\", u, err)\n\t\t}\n\t\tdefer resp.Body.Close()\n\n\t\tparsedURL, err := url.Parse(u)\n\t\tif err != nil {\n\t\t\tlog.Fatalf(\"error parsing url\")\n\t\t}\n\n\t\tarticle, err := readability.FromReader(resp.Body, parsedURL)\n\t\tif err != nil {\n\t\t\tlog.Fatalf(\"failed to parse %s: %v\\n\", u, err)\n\t\t}\n\n\t\tfmt.Printf(\"URL     : %s\\n\", u)\n\t\tfmt.Printf(\"Title   : %s\\n\", article.Title)\n\t\tfmt.Printf(\"Author  : %s\\n\", article.Byline)\n\t\tfmt.Printf(\"Length  : %d\\n\", article.Length)\n\t\tfmt.Printf(\"Excerpt : %s\\n\", article.Excerpt)\n\t\tfmt.Printf(\"SiteName: %s\\n\", article.SiteName)\n\t\tfmt.Printf(\"Image   : %s\\n\", article.Image)\n\t\tfmt.Printf(\"Favicon : %s\\n\", article.Favicon)\n\t\tfmt.Println()\n\t}\n}\n\n```\n\n## Command Line Usage\n\nYou can also use `go-readability` as command line app. To do that, first install the CLI :\n\n```\ngo install github.com/go-shiori/go-readability/cmd/go-readability@latest\n```\n\nNow you can use it by running `go-readability` in your terminal :\n\n```\n$ go-readability -h\n\ngo-readability is parser to fetch the readable content of a web page.\nThe source can be an url or existing file in your storage.\n\nUsage:\n  go-readability [flags] source\n\nFlags:\n  -h, --help          help for go-readability\n  -l, --http string   start the http server at the specified address\n  -m, --metadata      only print the page's metadata\n  -t, --text          only print the page's text\n```\n\n## Licenses\n\nGo-Readability is distributed under [MIT license][mit], which means you can use and modify it however you want. However, if you make an enhancement for it, if possible, please send a pull request. If you like this project, please consider donating to me either via [PayPal][paypal] or [Ko-Fi][kofi].\n\n[go-ref]: https://pkg.go.dev/github.com/go-shiori/go-readability\n[go-ref-badge]: https://img.shields.io/static/v1?label=\u0026message=Reference\u0026color=007d9c\u0026logo=go\u0026logoColor=white\n[paypal]: https://www.paypal.me/RadhiFadlillah\n[paypal-badge]: https://img.shields.io/static/v1?label=\u0026message=PayPal\u0026color=00457C\u0026logo=paypal\u0026logoColor=white\n[kofi]: https://ko-fi.com/radhifadlillah\n[kofi-badge]: https://img.shields.io/static/v1?label=\u0026message=Ko-fi\u0026color=F16061\u0026logo=ko-fi\u0026logoColor=white\n[readability.js]: https://github.com/mozilla/readability\n[mozilla]: https://github.com/mozilla\n[last-version]: https://github.com/mozilla/readability/tree/0.4.4\n[last-commit]: https://github.com/mozilla/readability/commit/b359811927a4bb2323eba085be004978fb18a926\n[mit]: https://choosealicense.com/licenses/mit/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgo-shiori%2Fgo-readability","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgo-shiori%2Fgo-readability","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgo-shiori%2Fgo-readability/lists"}