{"id":13621330,"url":"https://github.com/lnenad/newser","last_synced_at":"2026-01-23T13:39:40.701Z","repository":{"id":49737090,"uuid":"461490335","full_name":"lnenad/newser","owner":"lnenad","description":"Newser is a simple utility to generate a pdf with you favorite news articles","archived":false,"fork":false,"pushed_at":"2024-08-23T11:50:31.000Z","size":317,"stargazers_count":86,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-15T01:39:07.132Z","etag":null,"topics":["news","pdf-generation","scraping","supernote"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lnenad.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-20T13:06:32.000Z","updated_at":"2025-01-15T06:44:23.000Z","dependencies_parsed_at":"2024-11-08T08:42:32.475Z","dependency_job_id":null,"html_url":"https://github.com/lnenad/newser","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lnenad/newser","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lnenad%2Fnewser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lnenad%2Fnewser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lnenad%2Fnewser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lnenad%2Fnewser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lnenad","download_url":"https://codeload.github.com/lnenad/newser/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lnenad%2Fnewser/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28693325,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-23T11:01:27.039Z","status":"ssl_error","status_checked_at":"2026-01-23T11:00:26.909Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["news","pdf-generation","scraping","supernote"],"created_at":"2024-08-01T21:01:04.813Z","updated_at":"2026-01-23T13:39:40.681Z","avatar_url":"https://github.com/lnenad.png","language":"Go","funding_links":[],"categories":["Go"],"sub_categories":[],"readme":"# Newser\n\nA simple utility to crawl some news sites or other resources and download content into a pdf\n\n![Screenshot](screenshot.png \"Screenshot of a pdf on windows\")\n\n## Building\n\nMake sure you have `config.yaml` setup and `go` available, then run `go build cmd/newser.go` or just run it from source with `go run cmd/newser.go`\n\n## Configuration\n\nConfiguration file is used to guide the pdf building process, right now only website parsing is supported. \n\nThe configuration file must have a top level `defs` (definitions), `font` and `output` properties. Right now `defs` must have a `website` property that contains website definitions.\n\nDefault config is part of the source repo.\n\n### Website Definitions\n\n```yaml\n-   index: \"index-page-url\"\n    indexSelector: \"css-selector-for-articles-index\"\n    titleSelector: \"title-selector-for-articles\"\n    linkSelector: \"selector-for-the-link-for-the-article-content\"\n    linkAttr: \"attribute-to-gather-from-link-selector\"\n    articleContainerSelector: \"article-container-selector\"\n    articleContentSelector: \"article-content-selector\"\n    ignoreString: \"if-found-in-article-article-will-be-ignored\"\n    removeElems:\n        - \"selector-in-article-html-to-remove\"\n        - \"someother-selector-in-article-html-to-remove\"\n    collectOnly: 0 # 0 if you want to collect all articles, or limit to N articles\n    disable: 0 # 1 if you want to disable this entry \n```\n\nThe good thing is you can be as specific with selectors as you want. So if a website has multiple sections that contain articles, you can have multiple definitions for it and only get the articles that you want. \n\n## Deps\n\nTop level deps are\n\n* fpdf - \"github.com/go-pdf/fpdf\" - For generating pdfs\n* yaml - \"gopkg.in/yaml.v2\" - For parsing yamls\n* colly - \"github.com/gocolly/colly/v2\" - For crawling websites\n\n## Contributing\n\nRight now the project is still pretty much done for my desire to read news on my Supernote (awesome gadget btw) so if you wanna do something clever just create a PR.\n\n## Contributors\n\n- [lnenad](github.com/lnenad)\n\n## Licence\n\nLicence is free for personal but paid for commercial, get in touch if you want to use the utility or code for commercial purposes.\n\n## Sponsors\n\n\n[CapSolver](https://www.capsolver.com/?utm_source=github\u0026utm_medium=banner_repo\u0026utm_campaign=scraping\u0026utm_term=newser) is an AI-powered service that automatically solves a range of CAPTCHAs, helping developers tackle CAPTCHA challenges encountered during web scraping. Whether you're extracting data from e-commerce sites, financial platforms, or social media, CapSolver supports CAPTCHAs like [reCAPTCHA V2](https://docs.capsolver.com/guide/captcha/ReCaptchaV2.html?utm_source=github\u0026utm_medium=banner_repo\u0026utm_campaign=scraping\u0026utm_term=newser), [reCAPTCHA V3](https://docs.capsolver.com/guide/captcha/ReCaptchaV3.html?utm_source=github\u0026utm_medium=banner_repo\u0026utm_campaign=scraping\u0026utm_term=newser), [hCaptcha](https://docs.capsolver.com/guide/captcha/HCaptcha.html?utm_source=github\u0026utm_medium=banner_repo\u0026utm_campaign=scraping\u0026utm_term=newser), [ImageToText](https://docs.capsolver.com/guide/recognition/ImageToTextTask.html?utm_source=github\u0026utm_medium=banner_repo\u0026utm_campaign=scraping\u0026utm_term=newser), [DataDome](https://docs.capsolver.com/guide/antibots/datadome.html?utm_source=github\u0026utm_medium=banner_repo\u0026utm_campaign=scraping\u0026utm_term=newser), [AWS](https://docs.capsolver.com/guide/captcha/awsWaf.html?utm_source=github\u0026utm_medium=banner_repo\u0026utm_campaign=scraping\u0026utm_term=newser), [Geetest](https://docs.capsolver.com/guide/captcha/Geetest.html?utm_source=github\u0026utm_medium=banner_repo\u0026utm_campaign=scraping\u0026utm_term=newser), [Cloudflare Turnstile](https://docs.capsolver.com/guide/antibots/cloudflare_turnstile.html?utm_source=github\u0026utm_medium=banner_repo\u0026utm_campaign=scraping\u0026utm_term=cariddi) and more. With API integration and browser extensions options, and flexible pricing packages, CapSolver adapts to diverse web scraping needs and scenarios. ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flnenad%2Fnewser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flnenad%2Fnewser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flnenad%2Fnewser/lists"}